Update documentation for text analysis

This commit is contained in:
Eike Kettner
2021-01-21 20:06:53 +01:00
parent 9957c3267e
commit 021ac568ae
2 changed files with 32 additions and 13 deletions

View File

@ -363,12 +363,17 @@ related to a tag, a corrpesondent etc.
When a new document arrives, this model is used to ask for what
metadata (tag, correspondent, etc) it thinks is likely to apply here.
Training the model is a rather resource intensive process, but using
an existing model is quite cheap and fast. A model is trained
periodically, the schedule can be defined in your collective settings.
For tags, you can define the tag categories that should be trained (or
that should not be trained). Docspell assigns one tag from all tags in
a category to a new document.
Training the model is a rather resource intensive process. How much
memory is needed, depends on the number of documents to learn from and
the size of text to consider. Both can be limited in the config file.
The default values might require a heap of 1.4G if you have many and
large documents. The maximum text length is about 8000 characters, if
*all* your documents would be that large, adjusting these values might
be necessary. But using an existing model is quite cheap and fast. A
model is trained periodically, the schedule can be defined in your
collective settings. For tags, you can define the tag categories that
should be trained (or that should not be trained). Docspell assigns
one tag from all tags in a category to a new document.
Note that tags that can not be derived from the text only, should
probably be excluded from learning. For example, if you tag all your