docspell/website/site/content/docs/tools/convert-all-pdf.md
2021-07-29 01:48:23 +02:00

60 lines
1.8 KiB
Markdown

+++
title = "Convert All PDFs (⊗)"
description = "Convert all PDF files using OcrMyPdf."
weight = 160
+++
{% infobubble(mode="info", title="⚠ Please note") %}
This script is now obsolete, you can use the [**CLI tool**](../cli/) instead.
Use the `convert-all-pdfs` admin command, e.g. `dsc admin
convert-all-pdfs`.
{% end %}
# convert-all-pdf.sh
With version 0.9.0 there was support added for another external tool,
[OCRMyPdf](https://github.com/jbarlow83/OCRmyPDF), that can convert
PDF files such that they contain the OCR-ed text layer. This tool is
optional and can be disabled.
In order to convert all previously processed files with this tool,
there is an
[endpoint](/openapi/docspell-openapi.html#api-Item-secItemConvertallpdfsPost)
that submits a task to convert all PDF files not already converted for
your collective.
There is no UI part to trigger this route, so you need to use curl or
the script `convert-all-pdfs.sh` in the `tools/` directory.
# Requirements
It is a bash script that additionally needs
[curl](https://curl.haxx.se/) and
[jq](https://stedolan.github.io/jq/).
# Usage
```
./convert-all-pdfs.sh [docspell-base-url]
```
For example, if docspell is at `http://localhost:7880`:
```
./convert-all-pdfs.sh http://localhost:7880
```
The script asks for your account name and password. It then logs in
and triggers the said endpoint. After this you should see a few tasks
running.
There will be one task per file to convert. All these tasks are
submitted with a low priority. So files uploaded through the webapp or
a [source](@/docs/webapp/uploading.md#anonymous-upload) with a high
priority, will be preferred as [configured in the job
executor](@/docs/joex/intro.md#scheduler-config). This is to not
disturb normal processing when many conversion tasks are being
executed.