mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-22 02:18:26 +00:00
Update documentation
This commit is contained in:
@ -20,6 +20,9 @@ The configuration of both components uses separate namespaces. The
|
||||
configuration for the REST server is below `docspell.server`, while
|
||||
the one for joex is below `docspell.joex`.
|
||||
|
||||
You can therefore use two separate config files or one single file
|
||||
containing both namespaces.
|
||||
|
||||
## JDBC
|
||||
|
||||
This configures the connection to the database. This has to be
|
||||
@ -281,6 +284,56 @@ just some minutes, the web application obtains new ones
|
||||
periodically. So a short time is recommended.
|
||||
|
||||
|
||||
## File Processing
|
||||
|
||||
Files are being processed by the joex component. So all the respective
|
||||
configuration is in this config only.
|
||||
|
||||
File processing involves several stages, detailed information can be
|
||||
found [here](@/docs/joex/file-processing.md#text-analysis).
|
||||
|
||||
Configuration allows to define the external tools and set some
|
||||
limitations to control memory usage. The sections are:
|
||||
|
||||
- `docspell.joex.extraction`
|
||||
- `docspell.joex.text-analysis`
|
||||
- `docspell.joex.convert`
|
||||
|
||||
Options to external commands can use variables that are replaced by
|
||||
values at runtime. Variables are enclosed in double braces `{{…}}`.
|
||||
Please see the default configuration for more details.
|
||||
|
||||
### `text-analysis.nlp.mode`
|
||||
|
||||
This setting defines which NLP mode to use. It defaults to `full`,
|
||||
which requires more memory for certain languages (with the advantage
|
||||
of better results). Other values are `basic`, `regexonly` and
|
||||
`disabled`. The modes `full` and `basic` use pre-defined lanugage
|
||||
models for procesing documents of languaes German, English and French.
|
||||
These require some amount of memory (see below).
|
||||
|
||||
The mode `basic` is like the "light" variant to `full`. It doesn't use
|
||||
all NLP features, which makes memory consumption much lower, but comes
|
||||
with the compromise of less accurate results.
|
||||
|
||||
The mode `regexonly` doesn't use pre-defined lanuage models, even if
|
||||
available. It checks your address book against a document to find
|
||||
metadata. That means, it is language independent. Also, when using
|
||||
`full` or `basic` with lanugages where no pre-defined models exist, it
|
||||
will degrade to `regexonly` for these.
|
||||
|
||||
The mode `disabled` skips NLP processing completely. This has least
|
||||
impact in memory consumption, obviously, but then only the classifier
|
||||
is used to find metadata.
|
||||
|
||||
You might want to try different modes and see what combination suits
|
||||
best your usage pattern and machine running joex. If a powerful
|
||||
machine is used, simply leave the defaults. When running on an older
|
||||
raspberry pi, for example, you might need to adjust things. The
|
||||
corresponding sections in [joex default config](#joex) and the [file
|
||||
processing](@/docs/joex/file-processing.md#text-analysis) page provide more
|
||||
details.
|
||||
|
||||
# File Format
|
||||
|
||||
The format of the configuration files can be
|
||||
|
Reference in New Issue
Block a user