mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-09-30 00:28:23 +00:00
Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the extracted text as another layer into the pdf - Although not used, the external conversion routine will now check for an existing text file that is named as the pdf file with extension `.txt`. If present it is included in the conversion result and will be used as the extracted text. - text extraction for pdf files happens now on the converted file, because it may already contain the text from the conversion step and thus avoids running OCR twice. - All errors during conversion are not fatal; processing continues without a converted file.
This commit is contained in:
@@ -23,3 +23,4 @@ Some early information about certain details can be found in a few
|
||||
- [0012 Periodic Tasks](adr/0012_periodic_tasks)
|
||||
- [0013 Archive Files](adr/0013_archive_files)
|
||||
- [0014 Full-Text Search](adr/0014_fulltext_search_engine)
|
||||
- [0015 Convert PDF files](adr/0015_convert_pdf_files)
|
||||
|
Reference in New Issue
Block a user