Commit Graph

35 Commits

Author SHA1 Message Date
renovate[bot]
3a43ad408c chore(deps): update alpine docker tag to v3.20.0 2024-05-22 23:07:10 +00:00
tenpai
e731d822dc
Add Japanese Vertical Support Branch for Tesseract and Ocrmypdf OCR (#2505)
* Add Japanese Vertical Support 
* Adds Japanese Vertical mappings to default configuration.
2024-04-16 20:24:57 +02:00
eikek
0a987f5b66 Change docker base images to 3.19.1
See #2504, alpine edge introduced a version of tesseract that is
problematic to use from within docspell
2024-02-29 21:52:00 +01:00
Jeff Anderson
28141b6b01
tesseract english data is now a separate package
It looks like https://git.alpinelinux.org/aports/commit/community/tesseract-ocr?id=e1dc19b16f34ba3faeba489ea3412d3b3c67c12f introduced the english data language as a separate package.
2024-01-30 15:02:45 -05:00
renovate[bot]
5ace04e3e1 Update alpine Docker tag to v20231219 2023-12-20 06:11:18 +00:00
eikek
b02a5c21fa
Merge pull request #2208 from mprasil/add-slovak-language-support
Add support for Slovak language
2023-12-19 23:13:11 +01:00
eikek
bb181f1819 Remove pip3 install
And hope the ocrmypdf installed via apk is working now.
2023-11-06 16:23:42 +01:00
renovate[bot]
99345e40bc
Update alpine Docker tag to v20230901 2023-09-15 06:57:34 +00:00
Miroslav Prasil
8826712259 Add support for Slovak language
Just the basic support was added following examples for other languages.
2023-08-03 14:26:19 +01:00
renovate[bot]
90972a0cc0 Update alpine Docker tag to v20230329 2023-05-04 21:14:38 +00:00
xshadowlegendx
c576f08c53 add khmer font 2023-03-29 17:48:45 +07:00
xshadowlegendx
40642dea10 temporary download khmer traineddata before the package being added to registry 2023-03-17 17:50:48 +07:00
xshadowlegendx
2a89942ae0 add tesseract lang for khmer 2023-03-16 23:51:12 +07:00
eikek
1d39b5c74e docker: remove non-existing package 2022-11-27 10:00:03 +01:00
eikek
b37f98e01f Remove explicit install of zlib
The explicit install was added earlier due to a broken zlib
package (see issue #1517). This has now been fixed for a while in
alpine and can be removed.
2022-11-24 23:56:23 +01:00
mergify[bot]
5154b22003
Merge pull request #1855 from eikek/fix-docker-image
Fix docker image
2022-11-19 17:54:04 +00:00
eikek
fe967899f3 Fix joex docker image for ocrmypdf
It appears that ocrmypdf requires some other python package at
runtime. 🤷

Issue: #1850
2022-11-19 18:16:54 +01:00
GooRoo
61d5585e68 Add Ukrainian language 2022-11-09 22:24:32 +01:00
eikek
a5315f44ee Update joex docker image
Must drop wkhtmltopdf because it is not available anymore in alpine.
Weasyprint is supposed to be a drop-in replacement, doing poorer
outputs in my eyes. There are alternatives like downloading
pre-compiled binaries, but not for all platforms.
2022-11-07 10:31:25 +01:00
eikek
c0feb13f63 Add Estonian language
Closes: #1646
2022-11-01 01:00:16 +01:00
eikek
7233f606af Fix dockerfiles 2022-08-12 17:30:57 +02:00
eikek
f626f684d5 Fix packages in dockerfile 2022-08-08 08:33:01 +02:00
eikek
5cd5ba46af Another try fixing the zlib issue in docker images
zlib 1.2.12-r0 is not working with openjdk, it affects the checksum
calculation of the db migrations. It must be at least 1.2.12-r1. For
some reason joex has this newer version, but the restserver image not.
They are installed explicitely now on both images.

That's why the migration is now disabled on rest-server in the
docker-compose file. It is ok if this is run on one server. It can now
happen that on first start joex is migrating the db and the restserver
tries to do things that don't work yet - it is a corner case. This is
removed with the next version.

Refs: #1517
2022-05-25 23:49:58 +02:00
eikek
5ec311c331 Add polish to processing lanugages
SOLR doesn't support polish out of the box. Plugins are required for
polish. The language has been added only with basic support. For
better results, a manual setup of solr is required.

Closes: #1345
2022-05-21 14:41:16 +02:00
eikek
9d69401fea Add Lithuanian to processing languages
SOLR doesn't support Lithuanian, maybe it can be added via plugins. A
manual setup of solr is required then. It has been added with basic
support.

Closes: #1540
2022-05-21 14:36:01 +02:00
eikek
d6a2ca48ca Adopt docker setup for addons (opt-in) 2022-05-21 00:44:17 +02:00
eikek
a6759a4f70 Use openjdk8 on alpine for arm64 and arm/v7 2022-03-04 21:51:40 +01:00
eikek
7afcdea9f6 Try older docker base image due to missing packages 2021-11-30 22:42:48 +01:00
eikek
26847dc970 Remove default config file in docker images
Since this file cannot be changed inside the image, and people need to
specify a new file or env variables, it doesn't make sense to add it.
Also if it is present, it is preferred to the env variables.
2021-10-25 11:27:17 +02:00
wallace
589c41003f Add hebrew document language 2021-08-24 01:19:42 +03:00
eikek
326cf1c087 Use different japanese train files for tesseract
They seem to work better as suggested here:
https://github.com/tesseract-ocr/tessdata/issues/119

Refs: #973
2021-08-13 16:46:37 +02:00
eikek
9457de32b6 Fix health check in docker images 2021-08-11 19:23:15 +02:00
eikek
f994d4b248 Add japanese document language 2021-07-28 20:05:48 +02:00
eikek
76e47310ce Docker buildx setup for multiple architectures 2021-06-08 21:29:36 +02:00
eikek
b122d9eab0 Rework docker setup 2021-05-31 14:32:37 +02:00