From 145c3084614f199269c64b92a6dcaf1f79d950ec Mon Sep 17 00:00:00 2001
From: Eike Kettner <eike.kettner@posteo.de>
Date: Wed, 2 Sep 2020 00:18:55 +0200
Subject: [PATCH] Update documentation

---
 website/elm/Feature.elm                      |  2 +-
 website/site/content/docs/webapp/metadata.md | 25 ++++++++++++++++----
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/website/elm/Feature.elm b/website/elm/Feature.elm
index 246aa7ad..4d2fb734 100644
--- a/website/elm/Feature.elm
+++ b/website/elm/Feature.elm
@@ -67,7 +67,7 @@ Text is extracted from all files. For scanned documents/images, OCR is used by u
     , { image = "img/analyze-feature.png"
       , header = "Text Analysis"
       , description = """
-The extracted text is analyzed and is used to find properties that can be annotated to your documents automatically.
+The extracted text is analyzed using ML techniques to find properties that can be annotated to your documents automatically.
 """
       }
     , { image = "img/filetype-feature.svg"
diff --git a/website/site/content/docs/webapp/metadata.md b/website/site/content/docs/webapp/metadata.md
index 36e5d57c..0f5e23b2 100644
--- a/website/site/content/docs/webapp/metadata.md
+++ b/website/site/content/docs/webapp/metadata.md
@@ -33,11 +33,26 @@ workflows, a tag category *state* may exist that includes tags like
 "assignment" semantics. Docspell doesn't propose any workflow, but it
 can help to implement some.
 
-The tags are *not* taken into account when creating suggestions from
-analyzed text yet. However, PDF files may contain metadata itself and
-if there is a metadata *keywords* list, these keywords are matched
-against the tags in the database. If they match, the item is tagged
-automatically.
+Docspell can try to predict a tag for new incoming documents
+automatically based on your existing data. This requires to train an
+algorithm. There are some caveats: the more data you have correctly
+tagged, the better are the results. So it won't work well for maybe
+the first 100 documents. Then the tags must somehow relate to a
+pattern in the document text. Tags like *todo* or *waiting* probably
+won't work, obviously. But the typical "document type" tag, like
+*invoice* and *receipt* is a good fit! That is why you need to provide
+a tag category so only sensible tags are being learned. The algorithm
+goes through all your items and learns patterns in the text that
+relate to the given tags. This training step can be run periodically,
+as specified in your collective settings such that docspell keeps
+learning from your already tagged data! More information about the
+algorithm can be found in the config, where it is possible to
+fine-tune this process.
+
+Another way to have items tagged automatically is when an input PDF
+file contains a list of keywords in its metadata section (this only
+applies to PDF files). These keywords are then matched against the
+tags in the database. If they match, the item is tagged with them.
 
 
 ## Organization and Person