Update documentation

2025-08-05 02:24:52 +00:00 · 2021-01-20 21:35:54 +01:00
parent 85ddc61d9d
commit a6c31be22f
6 changed files with 206 additions and 93 deletions
--- a/website/site/content/docs/configure/_index.md
+++ b/website/site/content/docs/configure/_index.md
@ -20,6 +20,9 @@ The configuration of both components uses separate namespaces. The
 configuration for the REST server is below `docspell.server`, while
 the one for joex is below `docspell.joex`.

+You can therefore use two separate config files or one single file
+containing both namespaces.
+
 ## JDBC

 This configures the connection to the database. This has to be
@ -281,6 +284,56 @@ just some minutes, the web application obtains new ones
 periodically. So a short time is recommended.


+## File Processing
+
+Files are being processed by the joex component. So all the respective
+configuration is in this config only.
+
+File processing involves several stages, detailed information can be
+found [here](@/docs/joex/file-processing.md#text-analysis).
+
+Configuration allows to define the external tools and set some
+limitations to control memory usage. The sections are:
+
+- `docspell.joex.extraction`
+- `docspell.joex.text-analysis`
+- `docspell.joex.convert`
+
+Options to external commands can use variables that are replaced by
+values at runtime. Variables are enclosed in double braces `{{…}}`.
+Please see the default configuration for more details.
+
+### `text-analysis.nlp.mode`
+
+This setting defines which NLP mode to use. It defaults to `full`,
+which requires more memory for certain languages (with the advantage
+of better results). Other values are `basic`, `regexonly` and
+`disabled`. The modes `full` and `basic` use pre-defined lanugage
+models for procesing documents of languaes German, English and French.
+These require some amount of memory (see below).
+
+The mode `basic` is like the "light" variant to `full`. It doesn't use
+all NLP features, which makes memory consumption much lower, but comes
+with the compromise of less accurate results.
+
+The mode `regexonly` doesn't use pre-defined lanuage models, even if
+available. It checks your address book against a document to find
+metadata. That means, it is language independent. Also, when using
+`full` or `basic` with lanugages where no pre-defined models exist, it
+will degrade to `regexonly` for these.
+
+The mode `disabled` skips NLP processing completely. This has least
+impact in memory consumption, obviously, but then only the classifier
+is used to find metadata.
+
+You might want to try different modes and see what combination suits
+best your usage pattern and machine running joex. If a powerful
+machine is used, simply leave the defaults. When running on an older
+raspberry pi, for example, you might need to adjust things. The
+corresponding sections in [joex default config](#joex) and the [file
+processing](@/docs/joex/file-processing.md#text-analysis) page provide more
+details.
+
 # File Format

 The format of the configuration files can be