Update documentation

This commit is contained in:
Eike Kettner
2021-01-20 21:35:54 +01:00
parent 85ddc61d9d
commit a6c31be22f
6 changed files with 206 additions and 93 deletions

View File

@ -20,6 +20,9 @@ The configuration of both components uses separate namespaces. The
configuration for the REST server is below `docspell.server`, while
the one for joex is below `docspell.joex`.
You can therefore use two separate config files or one single file
containing both namespaces.
## JDBC
This configures the connection to the database. This has to be
@ -281,6 +284,56 @@ just some minutes, the web application obtains new ones
periodically. So a short time is recommended.
## File Processing
Files are being processed by the joex component. So all the respective
configuration is in this config only.
File processing involves several stages, detailed information can be
found [here](@/docs/joex/file-processing.md#text-analysis).
Configuration allows to define the external tools and set some
limitations to control memory usage. The sections are:
- `docspell.joex.extraction`
- `docspell.joex.text-analysis`
- `docspell.joex.convert`
Options to external commands can use variables that are replaced by
values at runtime. Variables are enclosed in double braces `{{…}}`.
Please see the default configuration for more details.
### `text-analysis.nlp.mode`
This setting defines which NLP mode to use. It defaults to `full`,
which requires more memory for certain languages (with the advantage
of better results). Other values are `basic`, `regexonly` and
`disabled`. The modes `full` and `basic` use pre-defined lanugage
models for procesing documents of languaes German, English and French.
These require some amount of memory (see below).
The mode `basic` is like the "light" variant to `full`. It doesn't use
all NLP features, which makes memory consumption much lower, but comes
with the compromise of less accurate results.
The mode `regexonly` doesn't use pre-defined lanuage models, even if
available. It checks your address book against a document to find
metadata. That means, it is language independent. Also, when using
`full` or `basic` with lanugages where no pre-defined models exist, it
will degrade to `regexonly` for these.
The mode `disabled` skips NLP processing completely. This has least
impact in memory consumption, obviously, but then only the classifier
is used to find metadata.
You might want to try different modes and see what combination suits
best your usage pattern and machine running joex. If a powerful
machine is used, simply leave the defaults. When running on an older
raspberry pi, for example, you might need to adjust things. The
corresponding sections in [joex default config](#joex) and the [file
processing](@/docs/joex/file-processing.md#text-analysis) page provide more
details.
# File Format
The format of the configuration files can be