mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-22 02:18:26 +00:00
Update documentation
This commit is contained in:
@ -25,19 +25,18 @@ work is done by the joex components.
|
||||
Running the joex component on the Raspberry Pi is possible, but will
|
||||
result in long processing times for OCR and text analysis. The board
|
||||
should provide 4G of RAM (like the current RPi4), especially if also a
|
||||
database and solr are running next to it. I recommend to give joex a
|
||||
heap of 1.5G (`-J-Xmx1536M`). You should also set the joex pool size
|
||||
to 1.
|
||||
|
||||
When joex processes the first file, some models are built loaded into
|
||||
memory which can take a while. Subsequent processing times are faster
|
||||
then.
|
||||
database and solr are running next to it. The memory required by joex
|
||||
depends on the config and document language. Please pick a value that
|
||||
suits your setup from [here](@/docs/install/running.md#memory-usage).
|
||||
For boards like the RPi, it might be necessary to use
|
||||
`nlp.mode=basic`, rather than `nlp.mode=full`. You should also set the
|
||||
joex pool size to 1.
|
||||
|
||||
An example: on this [UP
|
||||
board](https://up-board.org/up/specifications/) with an Intel Atom
|
||||
x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi) pdf file with 6
|
||||
pages took *3:20 min* to process. This board also runs the SOLR and a
|
||||
postgresql database.
|
||||
x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi, in German) pdf
|
||||
file with 6 pages took *3:20 min* to process. This board also runs the
|
||||
SOLR and a postgresql database.
|
||||
|
||||
The same file was processed in 55s on a qemu virtual machine on my i7
|
||||
notebook, using 1 CPU and 4G RAM (and identical config for joex). The
|
||||
|
@ -35,6 +35,42 @@ You should be able to create a new account and sign in. Check the
|
||||
[configuration page](@/docs/configure/_index.md) to further customize
|
||||
docspell.
|
||||
|
||||
## Memory Usage
|
||||
|
||||
The memory requirements for the joex component depends on the document
|
||||
language and the configuration for [file
|
||||
processing](@/docs/configure/_index.md#file-processing). The
|
||||
`nlp.mode` setting has significant impact, especially when your
|
||||
documents are in German. Here are some rough numbers on jvm heap usage
|
||||
(the same small jpeg file was used for all tries):
|
||||
|
||||
<table class="table is-hoverable is-striped">
|
||||
<thead>
|
||||
<tr><th>nlp.mode</th><th>English</th><th>German</th><th>French</th></tr>
|
||||
</thead>
|
||||
<tfoot>
|
||||
</tfoot>
|
||||
<tbody>
|
||||
<tr><td>full</td><td>420M</td><td>950M</td><td>490M</td></tr>
|
||||
<tr><td>basic</td><td>170M</td><td>380M</td><td>390M</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
When using `mode=full`, a heap setting of at least `-Xmx1400M` is
|
||||
recommended. For `mode=basic` a heap setting of at least `-Xmx500M` is
|
||||
recommended.
|
||||
|
||||
Other languages can't use these two modes, and so don't require this
|
||||
amount of memory (but don't have as good results). Then you can go
|
||||
with less heap.
|
||||
|
||||
More details about these modes can be found
|
||||
[here](@/docs/joex/file-processing.md#text-analysis).
|
||||
|
||||
|
||||
The restserver component is very lightweight, here you can use
|
||||
defaults.
|
||||
|
||||
|
||||
## Options
|
||||
|
||||
@ -65,10 +101,10 @@ $ ./docspell-restserver*/bin/docspell-restserver -h
|
||||
|
||||
gives an overview of supported options.
|
||||
|
||||
It is recommended to run joex with 1.5G heap space or more and with
|
||||
the G1GC enabled. If you use java8, you need to add an option to use
|
||||
G1GC, for java11 this is not necessary (but doesn't hurt either). This
|
||||
could look like this:
|
||||
It is recommended to run joex with the G1GC enabled. If you use java8,
|
||||
you need to add an option to use G1GC (`-XX:+UseG1GC`), for java11
|
||||
this is not necessary (but doesn't hurt either). This could look like
|
||||
this:
|
||||
|
||||
```
|
||||
./docspell-joex-{{version()}}/bin/docspell-joex -J-Xmx1596M -J-XX:+UseG1GC -- /path/to/joex.conf
|
||||
|
Reference in New Issue
Block a user