docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-10-20 04:10:12 +00:00

Files

Eike Kettner 3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion

- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.

2020-07-18 17:19:29 +02:00

img

Adding extraction primitives

2020-02-16 21:37:26 +01:00

0000_use_markdown_architectural_decision_records.md

Update microsite

2020-03-28 21:44:14 +01:00

0001_components.md

Update microsite

2020-03-28 21:44:14 +01:00

0002_component_interaction.md

Update microsite

2020-03-28 21:44:14 +01:00

0003_encryption.md

Update microsite

2020-03-28 21:44:14 +01:00

0004_iso8601vsEpoch.md

Update microsite

2020-03-28 21:44:14 +01:00

0005_job-executor.md

Update microsite

2020-03-28 21:44:14 +01:00

0006_more-file-types.md

Update microsite

2020-03-28 21:44:14 +01:00

0007_convert_html_files.md

Update microsite

2020-03-28 21:44:14 +01:00

0008_convert_plain_text.md

Update microsite

2020-03-28 21:44:14 +01:00

0009_convert_office_docs.md

Update microsite

2020-03-28 21:44:14 +01:00

0010_convert_image_files.md

Update microsite

2020-03-28 21:44:14 +01:00

0011_extract_text.md

Update microsite

2020-03-28 21:44:14 +01:00

0012_periodic_tasks.md

Update microsite

2020-03-28 21:44:14 +01:00

0013_archive_files.md

Update microsite

2020-03-28 21:44:14 +01:00

0014_fulltext_search_engine.md

Update documentation and fix changelog wording

2020-06-29 20:37:52 +02:00

0015_convert_pdf_files.md

Use ocrmypdf tool to create pdf/a during conversion

2020-07-18 17:19:29 +02:00

process-files.puml

Adding extraction primitives

2020-02-16 21:37:26 +01:00

template.md

Integrate periodic tasks

2020-03-08 22:49:49 +01:00