Make the text length limit optional

2025-08-05 02:24:52 +00:00 · 2021-01-22 22:56:51 +01:00
parent 8dd1672c8c
commit c7e850116f
5 changed files with 35 additions and 17 deletions
--- a/website/site/content/docs/configure/_index.md
+++ b/website/site/content/docs/configure/_index.md
@ -312,9 +312,9 @@ most should be used for learning. The default settings should work
 well for most cases. However, it always depends on the amount of data
 and the machine that runs joex. For example, by default the documents
 to learn from are limited to 600 (`classification.item-count`) and
-every text is cut after 8000 characters (`text-analysis.max-length`).
+every text is cut after 5000 characters (`text-analysis.max-length`).
 This is fine if *most* of your documents are small and only a few are
-near 8000 characters). But if *all* your documents are very large, you
+near 5000 characters). But if *all* your documents are very large, you
 probably need to either assign more heap memory or go down with the
 limits.

--- a/website/site/content/docs/joex/file-processing.md
+++ b/website/site/content/docs/joex/file-processing.md
@ -367,13 +367,15 @@ Training the model is a rather resource intensive process. How much
 memory is needed, depends on the number of documents to learn from and
 the size of text to consider. Both can be limited in the config file.
 The default values might require a heap of 1.4G if you have many and
-large documents. The maximum text length is about 8000 characters, if
+large documents. The maximum text length is set to 5000 characters. If
 *all* your documents would be that large, adjusting these values might
-be necessary. But using an existing model is quite cheap and fast. A
-model is trained periodically, the schedule can be defined in your
-collective settings. For tags, you can define the tag categories that
-should be trained (or that should not be trained). Docspell assigns
-one tag from all tags in a category to a new document.
+be necessary. A model is trained periodically, the schedule can be
+defined in your collective settings. Although learning is resource
+intensive, using an existing model is quite cheap and fast.
+
+For tags, you can define the tag categories that should be trained (or
+that should not be trained). Docspell assigns one tag (or none) from
+all tags in a category to a new document.

 Note that tags that can not be derived from the text only, should
 probably be excluded from learning. For example, if you tag all your