Update documentation

2025-09-15 21:46:53 +00:00 · 2020-09-02 00:18:55 +02:00
parent 4309bd8dfd
commit 145c308461
2 changed files with 21 additions and 6 deletions
--- a/website/elm/Feature.elm
+++ b/website/elm/Feature.elm
@@ -67,7 +67,7 @@ Text is extracted from all files. For scanned documents/images, OCR is used by u
    , { image = "img/analyze-feature.png"
      , header = "Text Analysis"
      , description = """
-The extracted text is analyzed and is used to find properties that can be annotated to your documents automatically.
+The extracted text is analyzed using ML techniques to find properties that can be annotated to your documents automatically.
 """
      }
    , { image = "img/filetype-feature.svg"
--- a/website/site/content/docs/webapp/metadata.md
+++ b/website/site/content/docs/webapp/metadata.md
@@ -33,11 +33,26 @@ workflows, a tag category *state* may exist that includes tags like
 "assignment" semantics. Docspell doesn't propose any workflow, but it
 can help to implement some.
-The tags are *not* taken into account when creating suggestions from
+Docspell can try to predict a tag for new incoming documents
-analyzed text yet. However, PDF files may contain metadata itself and
+automatically based on your existing data. This requires to train an
-if there is a metadata *keywords* list, these keywords are matched
+algorithm. There are some caveats: the more data you have correctly
-against the tags in the database. If they match, the item is tagged
+tagged, the better are the results. So it won't work well for maybe
-automatically.
+the first 100 documents. Then the tags must somehow relate to a
 pattern in the document text. Tags like *todo* or *waiting* probably
 won't work, obviously. But the typical "document type" tag, like
 *invoice* and *receipt* is a good fit! That is why you need to provide
 a tag category so only sensible tags are being learned. The algorithm
 goes through all your items and learns patterns in the text that
 relate to the given tags. This training step can be run periodically,
 as specified in your collective settings such that docspell keeps
 learning from your already tagged data! More information about the
 algorithm can be found in the config, where it is possible to
 fine-tune this process.
 Another way to have items tagged automatically is when an input PDF
 file contains a list of keywords in its metadata section (this only
 applies to PDF files). These keywords are then matched against the
 tags in the database. If they match, the item is tagged with them.
 ## Organization and Person