Update documentation

2025-08-05 02:24:52 +00:00 · 2021-01-20 21:35:54 +01:00
parent 85ddc61d9d
commit a6c31be22f
6 changed files with 206 additions and 93 deletions
--- a/website/site/content/docs/install/rpi.md
+++ b/website/site/content/docs/install/rpi.md
@ -25,19 +25,18 @@ work is done by the joex components.
 Running the joex component on the Raspberry Pi is possible, but will
 result in long processing times for OCR and text analysis. The board
 should provide 4G of RAM (like the current RPi4), especially if also a
-database and solr are running next to it. I recommend to give joex a
-heap of 1.5G (`-J-Xmx1536M`). You should also set the joex pool size
-to 1.
-
-When joex processes the first file, some models are built loaded into
-memory which can take a while. Subsequent processing times are faster
-then.
+database and solr are running next to it. The memory required by joex
+depends on the config and document language. Please pick a value that
+suits your setup from [here](@/docs/install/running.md#memory-usage).
+For boards like the RPi, it might be necessary to use
+`nlp.mode=basic`, rather than `nlp.mode=full`. You should also set the
+joex pool size to 1.

 An example: on this [UP
 board](https://up-board.org/up/specifications/) with an Intel Atom
-x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi) pdf file with 6
-pages took *3:20 min* to process. This board also runs the SOLR and a
-postgresql database.
+x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi, in German) pdf
+file with 6 pages took *3:20 min* to process. This board also runs the
+SOLR and a postgresql database.

 The same file was processed in 55s on a qemu virtual machine on my i7
 notebook, using 1 CPU and 4G RAM (and identical config for joex). The
--- a/website/site/content/docs/install/running.md
+++ b/website/site/content/docs/install/running.md
@ -35,6 +35,42 @@ You should be able to create a new account and sign in. Check the
 [configuration page](@/docs/configure/_index.md) to further customize
 docspell.

+## Memory Usage
+
+The memory requirements for the joex component depends on the document
+language and the configuration for [file
+processing](@/docs/configure/_index.md#file-processing). The
+`nlp.mode` setting has significant impact, especially when your
+documents are in German. Here are some rough numbers on jvm heap usage
+(the same small jpeg file was used for all tries):
+
+<table class="table is-hoverable is-striped">
+<thead>
+  <tr><th>nlp.mode</th><th>English</th><th>German</th><th>French</th></tr>
+</thead>
+<tfoot>
+</tfoot>
+<tbody>
+  <tr><td>full</td><td>420M</td><td>950M</td><td>490M</td></tr>
+  <tr><td>basic</td><td>170M</td><td>380M</td><td>390M</td></tr>
+</tbody>
+</table>
+
+When using `mode=full`, a heap setting of at least `-Xmx1400M` is
+recommended. For `mode=basic` a heap setting of at least `-Xmx500M` is
+recommended.
+
+Other languages can't use these two modes, and so don't require this
+amount of memory (but don't have as good results). Then you can go
+with less heap.
+
+More details about these modes can be found
+[here](@/docs/joex/file-processing.md#text-analysis).
+
+
+The restserver component is very lightweight, here you can use
+defaults.
+

 ## Options

@ -65,10 +101,10 @@ $ ./docspell-restserver*/bin/docspell-restserver -h

 gives an overview of supported options.

-It is recommended to run joex with 1.5G heap space or more and with
-the G1GC enabled. If you use java8, you need to add an option to use
-G1GC, for java11 this is not necessary (but doesn't hurt either). This
-could look like this:
+It is recommended to run joex with the G1GC enabled. If you use java8,
+you need to add an option to use G1GC (`-XX:+UseG1GC`), for java11
+this is not necessary (but doesn't hurt either). This could look like
+this:

 ```
 ./docspell-joex-{{version()}}/bin/docspell-joex -J-Xmx1596M -J-XX:+UseG1GC -- /path/to/joex.conf