docspell/modules/microsite/docs/dev/adr
Eike Kettner 3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.
2020-07-18 17:19:29 +02:00
..
img Adding extraction primitives 2020-02-16 21:37:26 +01:00
0000_use_markdown_architectural_decision_records.md Update microsite 2020-03-28 21:44:14 +01:00
0001_components.md Update microsite 2020-03-28 21:44:14 +01:00
0002_component_interaction.md Update microsite 2020-03-28 21:44:14 +01:00
0003_encryption.md Update microsite 2020-03-28 21:44:14 +01:00
0004_iso8601vsEpoch.md Update microsite 2020-03-28 21:44:14 +01:00
0005_job-executor.md Update microsite 2020-03-28 21:44:14 +01:00
0006_more-file-types.md Update microsite 2020-03-28 21:44:14 +01:00
0007_convert_html_files.md Update microsite 2020-03-28 21:44:14 +01:00
0008_convert_plain_text.md Update microsite 2020-03-28 21:44:14 +01:00
0009_convert_office_docs.md Update microsite 2020-03-28 21:44:14 +01:00
0010_convert_image_files.md Update microsite 2020-03-28 21:44:14 +01:00
0011_extract_text.md Update microsite 2020-03-28 21:44:14 +01:00
0012_periodic_tasks.md Update microsite 2020-03-28 21:44:14 +01:00
0013_archive_files.md Update microsite 2020-03-28 21:44:14 +01:00
0014_fulltext_search_engine.md Update documentation and fix changelog wording 2020-06-29 20:37:52 +02:00
0015_convert_pdf_files.md Use ocrmypdf tool to create pdf/a during conversion 2020-07-18 17:19:29 +02:00
process-files.puml Adding extraction primitives 2020-02-16 21:37:26 +01:00
template.md Integrate periodic tasks 2020-03-08 22:49:49 +01:00