docspell/website/site/content/docs/tools/convert-all-pdf.md
2021-07-29 01:48:23 +02:00

1.8 KiB

+++ title = "Convert All PDFs (⊗)" description = "Convert all PDF files using OcrMyPdf." weight = 160 +++

{% infobubble(mode="info", title="⚠ Please note") %} This script is now obsolete, you can use the CLI tool instead.

Use the convert-all-pdfs admin command, e.g. dsc admin convert-all-pdfs. {% end %}

convert-all-pdf.sh

With version 0.9.0 there was support added for another external tool, OCRMyPdf, that can convert PDF files such that they contain the OCR-ed text layer. This tool is optional and can be disabled.

In order to convert all previously processed files with this tool, there is an endpoint that submits a task to convert all PDF files not already converted for your collective.

There is no UI part to trigger this route, so you need to use curl or the script convert-all-pdfs.sh in the tools/ directory.

Requirements

It is a bash script that additionally needs curl and jq.

Usage

./convert-all-pdfs.sh [docspell-base-url]

For example, if docspell is at http://localhost:7880:

./convert-all-pdfs.sh http://localhost:7880

The script asks for your account name and password. It then logs in and triggers the said endpoint. After this you should see a few tasks running.

There will be one task per file to convert. All these tasks are submitted with a low priority. So files uploaded through the webapp or a source with a high priority, will be preferred as configured in the job executor. This is to not disturb normal processing when many conversion tasks are being executed.