mirror of
https://github.com/TheAnachronism/docspell.git
synced 2024-11-13 02:31:10 +00:00
bc1ec90b6e
This cuts down considerably when high-dpi images are provided in pdfs. The test file, scanned with 600dpi resulting in a 5.4M pdf file contains a 9900x13800 image. This image is loaded into memory in order to scale it down by PDFBox. This easily results in out of memory errors (this image requires already ~400M). With subsampling the size is reduced at most by a factor of 8. Still recommended to avoid large dpi image-only scans for text based documents or increase the heap size for joex. |
||
---|---|---|
.. | ||
main | ||
test/scala/docspell/extract |