mirror of
https://github.com/TheAnachronism/docspell.git
synced 2024-11-13 02:31:10 +00:00
3d49ceaab5
- Use another external tool to convert pdf to pdf which also adds the extracted text as another layer into the pdf - Although not used, the external conversion routine will now check for an existing text file that is named as the pdf file with extension `.txt`. If present it is included in the conversion result and will be used as the extracted text. - text extraction for pdf files happens now on the converted file, because it may already contain the text from the conversion step and thus avoids running OCR twice. - All errors during conversion are not fatal; processing continues without a converted file. |
||
---|---|---|
.. | ||
img | ||
0000_use_markdown_architectural_decision_records.md | ||
0001_components.md | ||
0002_component_interaction.md | ||
0003_encryption.md | ||
0004_iso8601vsEpoch.md | ||
0005_job-executor.md | ||
0006_more-file-types.md | ||
0007_convert_html_files.md | ||
0008_convert_plain_text.md | ||
0009_convert_office_docs.md | ||
0010_convert_image_files.md | ||
0011_extract_text.md | ||
0012_periodic_tasks.md | ||
0013_archive_files.md | ||
0014_fulltext_search_engine.md | ||
0015_convert_pdf_files.md | ||
process-files.puml | ||
template.md |