mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-21 18:08:25 +00:00
Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.
This commit is contained in:
@ -15,6 +15,7 @@ RUN apk add --no-cache openjdk11-jre \
|
||||
tesseract-ocr \
|
||||
tesseract-ocr-data-deu \
|
||||
tesseract-ocr-data-fra \
|
||||
tesseract-ocr-data-ita \
|
||||
unpaper \
|
||||
wkhtmltopdf \
|
||||
libreoffice \
|
||||
|
Reference in New Issue
Block a user