Update documentation

This commit is contained in:
Eike Kettner
2021-01-20 21:35:54 +01:00
parent 85ddc61d9d
commit a6c31be22f
6 changed files with 206 additions and 93 deletions

View File

@ -25,19 +25,18 @@ work is done by the joex components.
Running the joex component on the Raspberry Pi is possible, but will
result in long processing times for OCR and text analysis. The board
should provide 4G of RAM (like the current RPi4), especially if also a
database and solr are running next to it. I recommend to give joex a
heap of 1.5G (`-J-Xmx1536M`). You should also set the joex pool size
to 1.
When joex processes the first file, some models are built loaded into
memory which can take a while. Subsequent processing times are faster
then.
database and solr are running next to it. The memory required by joex
depends on the config and document language. Please pick a value that
suits your setup from [here](@/docs/install/running.md#memory-usage).
For boards like the RPi, it might be necessary to use
`nlp.mode=basic`, rather than `nlp.mode=full`. You should also set the
joex pool size to 1.
An example: on this [UP
board](https://up-board.org/up/specifications/) with an Intel Atom
x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi) pdf file with 6
pages took *3:20 min* to process. This board also runs the SOLR and a
postgresql database.
x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi, in German) pdf
file with 6 pages took *3:20 min* to process. This board also runs the
SOLR and a postgresql database.
The same file was processed in 55s on a qemu virtual machine on my i7
notebook, using 1 CPU and 4G RAM (and identical config for joex). The

View File

@ -35,6 +35,42 @@ You should be able to create a new account and sign in. Check the
[configuration page](@/docs/configure/_index.md) to further customize
docspell.
## Memory Usage
The memory requirements for the joex component depends on the document
language and the configuration for [file
processing](@/docs/configure/_index.md#file-processing). The
`nlp.mode` setting has significant impact, especially when your
documents are in German. Here are some rough numbers on jvm heap usage
(the same small jpeg file was used for all tries):
<table class="table is-hoverable is-striped">
<thead>
<tr><th>nlp.mode</th><th>English</th><th>German</th><th>French</th></tr>
</thead>
<tfoot>
</tfoot>
<tbody>
<tr><td>full</td><td>420M</td><td>950M</td><td>490M</td></tr>
<tr><td>basic</td><td>170M</td><td>380M</td><td>390M</td></tr>
</tbody>
</table>
When using `mode=full`, a heap setting of at least `-Xmx1400M` is
recommended. For `mode=basic` a heap setting of at least `-Xmx500M` is
recommended.
Other languages can't use these two modes, and so don't require this
amount of memory (but don't have as good results). Then you can go
with less heap.
More details about these modes can be found
[here](@/docs/joex/file-processing.md#text-analysis).
The restserver component is very lightweight, here you can use
defaults.
## Options
@ -65,10 +101,10 @@ $ ./docspell-restserver*/bin/docspell-restserver -h
gives an overview of supported options.
It is recommended to run joex with 1.5G heap space or more and with
the G1GC enabled. If you use java8, you need to add an option to use
G1GC, for java11 this is not necessary (but doesn't hurt either). This
could look like this:
It is recommended to run joex with the G1GC enabled. If you use java8,
you need to add an option to use G1GC (`-XX:+UseG1GC`), for java11
this is not necessary (but doesn't hurt either). This could look like
this:
```
./docspell-joex-{{version()}}/bin/docspell-joex -J-Xmx1596M -J-XX:+UseG1GC -- /path/to/joex.conf