mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-04-04 10:29:34 +00:00
Update docs on rpi regarding requirements for joex
This commit is contained in:
parent
616284c167
commit
a36c499fb1
@ -35,7 +35,7 @@ in
|
|||||||
waitForTarget = "solr-init.target";
|
waitForTarget = "solr-init.target";
|
||||||
bind.address = "0.0.0.0";
|
bind.address = "0.0.0.0";
|
||||||
base-url = "http://localhost:7878";
|
base-url = "http://localhost:7878";
|
||||||
jvmArgs = [ "-J-Xmx2g" ];
|
jvmArgs = [ "-J-Xmx1536M" ];
|
||||||
inherit full-text-search;
|
inherit full-text-search;
|
||||||
};
|
};
|
||||||
services.docspell-restserver = {
|
services.docspell-restserver = {
|
||||||
|
@ -19,19 +19,28 @@ work is done by the joex components.
|
|||||||
## Joex
|
## Joex
|
||||||
|
|
||||||
Running the joex component on the Raspberry Pi is possible, but will
|
Running the joex component on the Raspberry Pi is possible, but will
|
||||||
result in long processing times for OCR. Files that don't require OCR
|
result in long processing times for OCR and text analysis. The board
|
||||||
are no problem.
|
should provide 4G of RAM (like the current RPi4), especially if also a
|
||||||
|
database and solr are running next to it. I recommend to give joex a
|
||||||
|
heap of 1.5G (`-J-Xmx1536M`). You should also set the joex pool size
|
||||||
|
to 1.
|
||||||
|
|
||||||
Tested on a RPi model 3 (4 cores, 1G RAM) processing a PDF (scanned
|
When joex processes the first file, some models are built loaded into
|
||||||
with 300dpi) with two pages took 9:52. You can speed it up
|
memory which can take a while. Subsequent processing times are faster
|
||||||
considerably by uninstalling the `unpaper` command, because this step
|
then.
|
||||||
takes quite long. This, of course, reduces the quality of OCR. But
|
|
||||||
without `unpaper` the same sample pdf was then processed in 1:24, a
|
|
||||||
speedup of 8 minutes.
|
|
||||||
|
|
||||||
You should limit the joex pool size to 1 and, depending on your model
|
An example: on this [UP
|
||||||
and the amount of RAM, set a heap size of at least 500M
|
board](https://up-board.org/up/specifications/) with an Intel Atom
|
||||||
(`-J-Xmx500M`).
|
x5-Z8350 CPU (@1.44Ghz) and 4G RAM, a scanned (300dpi) pdf file with 6
|
||||||
|
pages took *3:20 min* to process. This board also runs the SOLR and a
|
||||||
|
postgresql database.
|
||||||
|
|
||||||
For personal setups, when you don't need the processing results asap,
|
The same file was processed in 55s on a qemu virtual machine on my i7
|
||||||
this can work well enough.
|
notebook, using 1 CPU and 4G RAM (and identical config for joex). The
|
||||||
|
virtual machine only had to host docspell (joex and restserver, but
|
||||||
|
the restserver is very lightweight).
|
||||||
|
|
||||||
|
The learning task for text classification can also use high amount of
|
||||||
|
memory, but this depends on the amount of data you have in docspell.
|
||||||
|
If you encounter problems here, you can set the maximum amount of
|
||||||
|
items to consider in the collective settings page.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user