Some research on pdf conversion

2025-09-15 21:46:53 +00:00 · 2020-02-11 22:41:44 +01:00
parent ce22b727b1
commit 3026f199f7
30 changed files with 649 additions and 5 deletions
--- a/modules/microsite/docs/dev/adr.md
+++ b/modules/microsite/docs/dev/adr.md
@@ -5,8 +5,9 @@ title: ADRs
 # ADR
- [0001 Components](adr/0001_components.html)
+- [0001 Components](adr/0001_components)
- [0002 Component Interaction](adr/0002_component_interaction.html)
+- [0002 Component Interaction](adr/0002_component_interaction)
- [0003 Encryption](adr/0003_encryption.html)
+- [0003 Encryption](adr/0003_encryption)
- [0004 ISO8601 vs Unix](adr/0004_iso8601vsEpoch.html)
+- [0004 ISO8601 vs Unix](adr/0004_iso8601vsEpoch)
- [0005 Job Executor](adr/0005_job-executor.html)
+- [0005 Job Executor](adr/0005_job-executor)
 - [0006 More File Types](adr/0006_more-file-types)
--- a/modules/microsite/docs/dev/adr/0000_use_markdown_architectural_decision_records.md
+++ b/modules/microsite/docs/dev/adr/0000_use_markdown_architectural_decision_records.md
@@ -1,3 +1,8 @@
 ---
 layout: docs
 title: Use Markdown Architectural Decision Records
 ---
 # Use Markdown Architectural Decision Records
 ## Context and Problem Statement
--- a/modules/microsite/docs/dev/adr/0006_more-file-types.md
+++ b/modules/microsite/docs/dev/adr/0006_more-file-types.md
@@ -0,0 +1,129 @@
 ---
 layout: docs
 title: More File Types
 ---
 # More File Types
 ## Context and Problem Statement
 Docspell currently only supports PDF files. This has simplified early
 development and design a lot and so helped with starting the project.
 Handling pdf files is usually easy (to view, to extract text, print
 etc).
 The pdf format has been chosen, because PDFs files are very common and
 can be viewed with many tools on many systems (i.e. non-proprietary
 tools). Docspell also is a document archive and from this perspective,
 it is important that documents can be viewed in 10 years and more. The
 hope is, that the PDF format is best suited for this. Therefore all
 documents in Docspell must be accessible as PDF. The trivial solution
 to this requirement is to only allow PDF files.
 Support for more document types, must then take care of the following:
 - extracting text
 - converting into pdf
 - access original file
 Text should be extracted from the source file, in case conversion is
 not lossless. Since Docspell can already extract text from PDF files
 using OCR, text can also be extracted from the converted file as a
 fallback.
 The original file must always be accessible. The main reason is that
 all uploaded data should be accessible without any modification. And
 since the conversion may not always create best results, the original
 file should be kept.
 ## Decision Drivers
 People expect that software like Docspell support the most common
 document types, like all the “office documents” (`docx`, `rtf`, `odt`,
 `xlsx`, …) and images. For many people it is more common to create
 those files instead of PDF. Some (older) scanners may not be able to
 scan into PDF files but only to image files.
 ## Considered Options
 This ADR does not evaluate different options. It rather documents why
 this feature is realized and the thoughts that lead to how it is
 implemented.
 ## Realization
 ### Data Model
 The `attachment` table holds one file. There will be another table
 `attachment_source` that holds the original file. It looks like this:
 ``` sql
 CREATE TABLE "attachment_source" (
  "id" varchar(254) not null primary key,
  "file_id" varchar(254) not null,
  "filename" varchar(254),
  "created" timestamp not null,
  foreign key ("file_id") references "filemeta"("id"),
  foreign key ("id") references "attachment"("attachid")
 );
 ```
 The `id` is the primary key and is the same as the associated
 `attachment`, creating a `1-1` relationship (well, more correct is
 `0..1-1`) between `attachment` and `attachment_source`.
 There will always be a `attachment_source` record for every
 `attachment` record. If the original file is a PDF already, then both
 table's `file_id` columns point to the same file. But now the user can
 change the filename of an `attachment` while the original filename is
 preserved in `attachment_source`. It must not be possible for the user
 to change anything in `attachment_source`.
 The `attachment` table is not touched in order to keep current code
 mostly unchanged and to have a simpler data migration. The downside
 is, that the data model allows to have an `attachment` record without
 an `attachment_source` record. OTOH, a foreign key inside `attachment`
 pointing to an `attachment_source` is also not correct, because it
 allows the same `attachment_source` record to be associated with many
 `attachment` records. This would do even more harm, in my opinion.
 ### Migration
 Creating a new table and not altering existing ones, should simplify
 data migration.
 Since only PDF files where allowed and the user could not change
 anything in the `attachment` table, the existing data can simply be
 inserted into the new table. This presents the trivial case where the
 attachment and source are the same.
 ### Processing
 The first step in processing is now converting the file into a pdf. If
 it already is a pdf, nothing is done. This step is before text
 extraction, so text can first be tried to extract from the source file
 and only if that fails (or is not supported), text can be extracted
 from the converted pdf file. All remaining steps are untouched.
 If conversion is not supported for the input file, it is skipped. If
 conversion fails, the error is propagated to let the retry mechanism
 take care.
 ### What types?
 Which file types should be supported? At a first step, all major
 office documents, common images, plain text (i.e. markdown) and html
 should be supported. In terms of file extensions: `doc`, `docx`,
 `xls`, `xlsx`, `odt`, `md`, `html`, `txt`, `jpg`, `png`, `tif`.
 ## Links
 * [Convert HTML Files](0007_convert_html_files)
 * [Convert Plain Text](0008_convert_plain_text)
 * [Convert Office Documents](0009_convert_office_docs)
 * [Convert Image Files](0010_convert_image_files)
--- a/modules/microsite/docs/dev/adr/0007_convert_html_files.md
+++ b/modules/microsite/docs/dev/adr/0007_convert_html_files.md
@@ -0,0 +1,71 @@
 ---
 layout: docs
 title: Convert HTML Files
 ---
 # {{ page.title }}
 ## Context and Problem Statement
 How can HTML documents be converted into a PDF file that looks as much
 as possible like the original?
 It would be nice to have a java-only solution. But if an external tool
 has a better outcome, then an external tool is fine, too.
 Since Docspell is free software, the tools must also be free.
 ## Considered Options
 * [pandoc](https://pandoc.org/) external command
 * [wkhtmltopdf](https://wkhtmltopdf.org/) external command
 * [Unoconv](https://github.com/unoconv/unoconv) external command
 Native (firefox) view:
 <div class="thumbnail">
  <img src="./img/example-html-native.jpg" title="Native view of an HTML example file">
 </div>
 Note: the example html is from
 [here](https://www.sparksuite.com/open-source/invoice.html).
 I downloaded the HTML file to disk together with its resources (using
 *Save as...* in the browser).
 ### Pandoc
 <div class="thumbnail">
  <img src="./img/example-html-pandoc-latex.jpg" title="Pandoc (Latex) HTML->PDF">
 </div>
 <div class="thumbnail">
  <img src="./img/example-html-pandoc-html.jpg" title="Pandoc (html) HTML->PDF">
 </div>
 Not showing the version using `context` pdf-engine, since it looked
 very similiar to the latex variant.
 ### wkhtmltopdf
 <div class="thumbnail">
  <img src="./img/example-html-wkhtmltopdf.jpg" title="wkhtmltopdf HTML->PDF">
 </div>
 ### Unoconv
 <div class="thumbnail">
  <img src="./img/example-html-unoconv.jpg" title="Unoconv HTML->PDF">
 </div>
 ## Decision Outcome
 wkhtmltopdf.
 It shows the best results.
--- a/modules/microsite/docs/dev/adr/0008_convert_plain_text.md
+++ b/modules/microsite/docs/dev/adr/0008_convert_plain_text.md
@@ -0,0 +1,191 @@
 ---
 layout: docs
 title: Convert Text Files
 ---
 # {{ page.title }}
 ## Context and Problem Statement
 How can plain text and markdown documents be converted into a PDF
 files?
 Rendering images is not important here, since the files must be self
 contained when uploaded to Docspell.
 The test file is the current documentation page of Docspell, found in
 `microsite/docs/doc.md`.
 ```
 ---
 layout: docs
 position: 4
 title: Documentation
 ---
 # {page .title}
 Docspell assists in organizing large amounts of PDF files that are
 ...
 ## How it works
 Documents have two ...
 1. You maintain a kind of address book. It should list all possible
   correspondents and the concerning people/things. This grows
   incrementally with each new unknown document.
 2. When docspell analyzes a document, it tries to find matches within
   your address ...
 3. You can inspect ...
 The set of meta data that docspell uses to draw suggestions from, must
 be maintained ...
 ## Terms
 In order to better understand these pages, some terms should be
 explained first.
 ### Item
 An **Item** is roughly your (pdf) document, only that an item may span
 multiple files, which are called **attachments**. And an item has
 **meta data** associated:
 - a **correspondent**: the other side of the communication. It can be
  an organization or a person.
 - a **concerning person** or **equipment**: a person or thing that
  this item is about. Maybe it is an insurance contract about your
  car.
 - ...
 ### Collective
 The users of the application are part of a **collective**. A
 **collective** is a group of users that share access to the same
 items. The account name is therefore comprised of a *collective name*
 and a *user name*.
 All users of a collective are equal; they have same permissions to
 access all...
 ```
 Then a plain text file is tried, too (without any markup).
 ```
 Maecenas mauris lectus, lobortis et purus mattis
 Duis vehicula mi vel mi pretium
 In non mauris justo. Duis vehicula mi vel mi pretium, a viverra erat efficitur. Cras aliquam est ac eros varius, id iaculis dui auctor. Duis pretium neque ligula, et pulvinar mi placerat et. Nulla nec nunc sit amet nunc posuere vestibulum. Ut id neque eget tortor mattis tristique. Donec ante est, blandit sit amet tristique vel, lacinia pulvinar arcu.
 Pellentesque scelerisque fermentum erat, id posuere justo pulvinar ut.
 Cras id eros sed enim aliquam lobortis. Sed lobortis nisl ut eros
 efficitur tincidunt. Cras justo mi, porttitor quis mattis vel,
 ultricies ut purus. Ut facilisis et lacus eu cursus.
 In eleifend velit vitae libero sollicitudin euismod:
 - Fusce vitae vestibulum velit,
 - Pellentesque vulputate lectus quis pellentesque commodo
 the end.
 ```
 ## Considered Options
 * [flexmark](https://github.com/vsch/flexmark-java) for markdown to
  HTML, then use existing machinery described in [adr
  7](./0007_convert_html_files)
 * [pandoc](https://pandoc.org/) external command
 ### flexmark markdown library for java
 Process files with [flexmark](https://github.com/vsch/flexmark-java)
 and then create a PDF from the resulting html.
 Using the following snippet:
 ``` scala
 def renderMarkdown(): ExitCode = {
    val opts = new MutableDataSet()
    opts.set(Parser.EXTENSIONS.asInstanceOf[DataKey[util.Collection[_]]],
      util.Arrays.asList(TablesExtension.create(),
      StrikethroughExtension.create()));
    val parser = Parser.builder(opts).build()
    val renderer = HtmlRenderer.builder(opts).build()
    val reader = Files.newBufferedReader(Paths.get("in.txt|md"))
    val doc = parser.parseReader(reader)
    val html = renderer.render(doc)
    val body = "<html><head></head><body style=\"padding: 0 5em;\">" + html + "</body></html>"
    Files.write(
      Paths.get("test.html"),
      body.getBytes(StandardCharsets.UTF_8))
    ExitCode.Success
  }
 ```
 Then run the result through `wkhtmltopdf`.
 Markdown file:
 <div class="thumbnail">
  <img src="./img/example-md-java.jpg" title="Flexmark/wkhtmltopdf MD->PDF">
 </div>
 TXT file:
 <div class="thumbnail">
  <img src="./img/example-txt-java.jpg" title="Flexmark/wkhtmltopdf TXT->PDF">
 </div>
 ### pandoc
 Command:
 ```
 pandoc -f markdown -t html -o test.pdf microsite/docs/doc.md
 ```
 Markdown/Latex:
 <div class="thumbnail">
  <img src="./img/example-md-pandoc-latex.jpg" title="Pandoc (Latex) MD->PDF">
 </div>
 Markdown/Html:
 <div class="thumbnail">
  <img src="./img/example-md-pandoc-html.jpg" title="Pandoc (html) MD->PDF">
 </div>
 Text/Latex:
 <div class="thumbnail">
  <img src="./img/example-txt-pandoc-latex.jpg" title="Pandoc (Latex) TXT->PDF">
 </div>
 Text/Html:
 <div class="thumbnail">
  <img src="./img/example-txt-pandoc-html.jpg" title="Pandoc (html) TXT->PDF">
 </div>
 ## Decision Outcome
 Java library "flexmark".
 I think all results are great. It depends on the type of document and
 what one expects to see. I guess that most people expect something
 like pandoc-html produces for the kind of files docspell is for (it is
 not for newspaper articles, where pandoc-latex would be best fit).
 But choosing pandoc means yet another external command to depend on.
 And the results from flexmark are really good, too. One can fiddle
 with options and css to make it look better.
 To not introduce another external command, decision is to use flexmark
 and then the already existing html->pdf conversion.
--- a/modules/microsite/docs/dev/adr/0009_convert_office_docs.md
+++ b/modules/microsite/docs/dev/adr/0009_convert_office_docs.md
@@ -0,0 +1,231 @@
 ---
 layout: docs
 title: Convert Office Documents
 ---
 # {{ page.title }}
 ## Context and Problem Statement
 How can office documents, like `docx` or `odt` be converted into a PDF
 file that looks as much as possible like the original?
 It would be nice to have a java-only solution. But if an external tool
 has a better outcome, then an external tool is fine, too.
 Since Docspell is free software, the tools must also be free.
 ## Considered Options
 * [Apache POI](https://poi.apache.org) together with
  [this](https://search.maven.org/artifact/fr.opensagres.xdocreport/org.apache.poi.xwpf.converter.pdf/1.0.6/jar)
  library
 * [pandoc](https://pandoc.org/) external command
 * [abiword]() external command
 * [Unoconv](https://github.com/unoconv/unoconv) external command
 To choose an option, some documents are converted to pdf and compared.
 Only the formats `docx` and `odt` are considered here. These are the
 most used formats. They have to look well, if a `xlsx` or `pptx`
 doesn't look so great, that is ok.
 Here is the native view to compare with:
 ODT:
 <div class="thumbnail">
  <img src="./img/example-odt-native.jpg" title="Native view of an ODT example file">
 </div>
 ### `XWPFConverter`
 I couldn't get any example to work. There were exceptions:
 ```
 java.lang.IllegalArgumentException: Value for parameter 'id' was out of bounds
    at org.apache.poi.util.IdentifierManager.reserve(IdentifierManager.java:80)
    at org.apache.poi.xwpf.usermodel.XWPFRun.<init>(XWPFRun.java:101)
    at org.apache.poi.xwpf.usermodel.XWPFRun.<init>(XWPFRun.java:146)
    at org.apache.poi.xwpf.usermodel.XWPFParagraph.buildRunsInOrderFromXml(XWPFParagraph.java:135)
    at org.apache.poi.xwpf.usermodel.XWPFParagraph.<init>(XWPFParagraph.java:88)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:147)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:124)
    at docspell.convert.Testing$.withPoi(Testing.scala:17)
    at docspell.convert.Testing$.$anonfun$run$1(Testing.scala:12)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
    at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
    at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
    at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:51)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
 ```
 The project (not Apache Poi, the other) seems unmaintained. I could
 not find any website and the artifact in maven central is from 2016.
 ### Pandoc
 I know pandoc as a very great tool when converting between markup
 documents. So this tries it with office documents. It supports `docx`
 and `odt` from there `--list-input-formats`.
 From the pandoc manual:
 > By default, pandoc will use LaTeX to create the PDF, which requires
 > that a LaTeX engine be installed (see --pdf-engine below).
 > Alternatively, pandoc can use ConTeXt, roff ms, or HTML as an
 > intermediate format. To do this, specify an output file with a .pdf
 > extension, as before, but add the --pdf-engine option or -t context,
 > -t html, or -t ms to the command line. The tool used to generate the
 > PDF from the intermediate format may be specified using --pdf-engine.
 Trying with latex engine:
 ```
 pandoc -f odt -o test.pdf example.odt
 ```
 Results ODT:
 <div class="thumbnail">
  <img src="./img/example-odt-pandoc-latex.jpg" title="Pandoc (Latex) ODT->PDF">
 </div>
 ```
 pandoc -f odt -o test.pdf example.docx
 ```
 Results DOCX:
 <div class="thumbnail">
  <img src="./img/example-docx-pandoc-latex.jpg" title="Pandoc (Latex) DOCX->PDF">
 </div>
 ----
 Trying with context engine:
 ```
 pandoc -f odt -t context -o test.pdf example.odt
 ```
 Results ODT:
 <div class="thumbnail">
  <img src="./img/example-odt-pandoc-context.jpg" title="Pandoc (Context) ODT->PDF">
 </div>
 Results DOCX:
 <div class="thumbnail">
  <img src="./img/example-docx-pandoc-context.jpg" title="Pandoc (Context) DOCX->PDF">
 </div>
 ----
 Trying with ms engine:
 ```
 pandoc -f odt -t ms -o test.pdf example.odt
 ```
 Results ODT:
 <div class="thumbnail">
  <img src="./img/example-odt-pandoc-ms.jpg" title="Pandoc (MS) ODT->PDF">
 </div>
 Results DOCX:
 <div class="thumbnail">
  <img src="./img/example-docx-pandoc-ms.jpg" title="Pandoc (MS) DOCX->PDF">
 </div>
 ---
 Trying with html engine (this requires `wkhtmltopdf` to be present):
 ```
 $ pandoc --extract-media . -f odt -t html -o test.pdf example.odt
 ```
 Results ODT:
 <div class="thumbnail">
  <img src="./img/example-odt-pandoc-html.jpg" title="Pandoc (html) ODT->PDF">
 </div>
 Results DOCX:
 <div class="thumbnail">
  <img src="./img/example-docx-pandoc-html.jpg" title="Pandoc (html) DOCX->PDF">
 </div>
 ### Abiword
 Trying with:
 ```
 abiword --to=pdf example.odt
 ```
 Results:
 <div class="thumbnail">
  <img src="./img/example-odt-abiword.jpg" title="Abiword ODT->PDF">
 </div>
 Trying with a `docx` file failed. It worked with a `doc` file.
 ### Unoconv
 Unoconv relies on libreoffice/openoffice, so installing it will result
 in installing parts of libreoffice, which is a very large dependency.
 Trying with:
 ```
 unoconv -f pdf example.odt
 ```
 Results ODT:
 <div class="thumbnail">
  <img src="./img/example-odt-unoconv.jpg" title="Unoconv ODT->PDF">
 </div>
 Results DOCX:
 <div class="thumbnail">
  <img src="./img/example-docx-unoconv.jpg" title="Unoconv ODT->PDF">
 </div>
 ## Decision Outcome
 Unoconv.
 The results from `unoconv` are really good.
 Abiword also is not that bad, it didn't convert the chart, but all
 font markup is there. It would be great to not depend on something as
 big as libreoffice, but the results are so much better.
 Also pandoc deals very well with DOCX files (using the `context`
 engine). The only thing that was not rendered was the embedded chart
 (like abiword). But all images and font styling was present.
 It will be a configurable external command anyways, so users can
 exchange it at any time with a different one.
--- a/modules/microsite/docs/dev/adr/0010_convert_image_files.md
+++ b/modules/microsite/docs/dev/adr/0010_convert_image_files.md
@@ -0,0 +1,16 @@
 ---
 layout: docs
 title: Convert Image Files
 ---
 # {{ page.title }}
 ## Context and Problem Statement
 How to convert image files properly to pdf?
 ## Considered Options
 * [pdfbox]() library
 * [pandoc](https://pandoc.org/) external command
--- a/modules/microsite/docs/dev/adr/img/example-docx-pandoc-context.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-docx-pandoc-context.jpg
--- a/modules/microsite/docs/dev/adr/img/example-docx-pandoc-html.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-docx-pandoc-html.jpg
--- a/modules/microsite/docs/dev/adr/img/example-docx-pandoc-latex.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-docx-pandoc-latex.jpg
--- a/modules/microsite/docs/dev/adr/img/example-docx-pandoc-ms.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-docx-pandoc-ms.jpg
--- a/modules/microsite/docs/dev/adr/img/example-docx-unoconv.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-docx-unoconv.jpg
--- a/modules/microsite/docs/dev/adr/img/example-html-native.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-html-native.jpg
--- a/modules/microsite/docs/dev/adr/img/example-html-pandoc-html.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-html-pandoc-html.jpg
--- a/modules/microsite/docs/dev/adr/img/example-html-pandoc-latex.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-html-pandoc-latex.jpg
--- a/modules/microsite/docs/dev/adr/img/example-html-unoconv.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-html-unoconv.jpg
--- a/modules/microsite/docs/dev/adr/img/example-html-wkhtmltopdf.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-html-wkhtmltopdf.jpg
--- a/modules/microsite/docs/dev/adr/img/example-md-java.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-md-java.jpg
--- a/modules/microsite/docs/dev/adr/img/example-md-pandoc-html.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-md-pandoc-html.jpg
--- a/modules/microsite/docs/dev/adr/img/example-md-pandoc-latex.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-md-pandoc-latex.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-abiword.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-abiword.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-native.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-native.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-pandoc-context.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-pandoc-context.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-pandoc-html.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-pandoc-html.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-pandoc-latex.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-pandoc-latex.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-pandoc-ms.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-pandoc-ms.jpg
--- a/modules/microsite/docs/dev/adr/img/example-odt-unoconv.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-odt-unoconv.jpg
--- a/modules/microsite/docs/dev/adr/img/example-txt-java.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-txt-java.jpg
--- a/modules/microsite/docs/dev/adr/img/example-txt-pandoc-html.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-txt-pandoc-html.jpg
--- a/modules/microsite/docs/dev/adr/img/example-txt-pandoc-latex.jpg
+++ b/modules/microsite/docs/dev/adr/img/example-txt-pandoc-latex.jpg