Commit Graph

108 Commits

Author SHA1 Message Date
eikek
326cf1c087 Use different japanese train files for tesseract
They seem to work better as suggested here:
https://github.com/tesseract-ocr/tessdata/issues/119

Refs: #973
2021-08-13 16:46:37 +02:00
Renovate Bot
93981fe869
Update postgres Docker tag to v13.4 2021-08-12 21:27:18 +00:00
eikek
9457de32b6 Fix health check in docker images 2021-08-11 19:23:15 +02:00
eikek
860e146bf0 Use latest dsc instead of nightly 2021-07-30 16:46:47 +02:00
eikek
96adde7172 Use the dsc tool for the consumedir docker container 2021-07-29 01:48:23 +02:00
eikek
f994d4b248 Add japanese document language 2021-07-28 20:05:48 +02:00
eikek
73b86e00f0 Provide admin endpoint section in the config by docker setup 2021-07-07 21:24:59 +02:00
eikek
180b8b3969 Fix solr url in docspell.conf for the docker setup 2021-07-07 21:20:57 +02:00
eikek
d6d2793554 Fix docker build script for releases 2021-06-18 23:53:41 +02:00
eikek
6984df42de Removing old docker setup 2021-06-18 22:27:29 +02:00
eikek
76e47310ce Docker buildx setup for multiple architectures 2021-06-08 21:29:36 +02:00
eikek
71b913b19f Add a release actions
Github workflows to create a nightly release automatically and a
stable release semi-automatic (a 2-step process).
2021-05-31 14:42:43 +02:00
eikek
b122d9eab0 Rework docker setup 2021-05-31 14:32:37 +02:00
Renovate Bot
a3059cf54c Update postgres Docker tag to v13.3 2021-05-14 22:42:43 +00:00
Renovate Bot
ac3cc80b70 Update postgres Docker tag to v13 2021-05-07 15:03:50 +00:00
Eike Kettner
df543a3e92 Make detecting version more reliable
The docker bash scripts try to get the version from sbt, without
calling sbt but reading the files. This was relying on a specific
position. It is now a bit more robust.
2021-04-12 00:42:48 +02:00
Renovate Bot
a3949e9cd0 Update postgres Docker tag to v11.11 2021-04-10 18:03:23 +00:00
Eike Kettner
a4462c626b Docker: revert PR #522, use one single joex instance
Reverts https://github.com/eikek/docspell/pull/552 to use a simpler
setup. Leave instructions to get back the more advanced setup provided
by PR #522. Also, #617 has more contextual info.

Fixes: #608
2021-03-11 01:02:13 +01:00
Eike Kettner
9991ad5fcc Add latvian language 2021-03-09 00:23:17 +01:00
Eike Kettner
131b721500 Try to fix the docker-hub build 2021-03-01 09:13:40 +01:00
Eike Kettner
42053cacca Add npm to docker build image
npm is now required to build docspell's new ui.
2021-02-14 10:33:59 +01:00
Jan Bader
ee67d49776
Fix CONSUMEDIR_DELETE being ignored
I just noticed that CONSUMEDIR_DELETE  wasn't handled yet. This fixes it.
2021-02-05 16:41:22 +01:00
Eike Kettner
198b2f8f96 Move consumedir entrypoint to docker and fix compose setup
To make it work as before, some env variables are necessary to be
available to the consumedir container. It's only needed there, but
since the env file exists, they are now listed there.

Also partially reverts the consumedir.dockerfile. Using the base-image
makes sure that all build images are created from the same sources.
2021-01-28 22:31:58 +01:00
Jan Bader
5df9d11d06 Simplify consumedir.dockerfile 2021-01-27 21:40:36 +01:00
Jan Bader
bebe0dabde Add consumedir-entrypoint.sh to allow polling consumedir 2021-01-27 21:13:16 +01:00
Markus Wüst
a917d66afd
Fix permission of scripts in base-binaries
Include shell files in subdirectory of `/opt/docspell-tools/` via `**/*.sh`.
2021-01-27 16:46:55 +01:00
Eike Kettner
189f202b21 Move tool scripts into a separate dir 2021-01-23 20:30:48 +01:00
Eike Kettner
3f75af0807 Add 9 more lanugages to the list of document lanugages 2021-01-18 17:41:40 +01:00
Eike Kettner
26dff18ae0 Add spanish as an example
Adding a new language without nlp requires now only to fill out the
pieces:

- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Bo Jeanes
36c29812c7 Allow scaling joex with docker-compose up --scale
Container name can't be hard coded and each joex instance needs a unique
name. Since Docker always sets the `HOSTNAME` variable and these are
unique, we can just interpolate the hostname into the joex app
identifier, to avoid creating multiple config files.
2021-01-09 10:33:11 +11:00
Eike Kettner
f7ffa10b07 Fix docker build
Currently these commands should be run in a single sbt session, since
the first one sets the elm compilation mode to "prod". Makes the js a
bit smaller removing debug infos.
2021-01-07 00:39:36 +01:00
Eike Kettner
2a172ce720 Remove fulltext recreate-key config value
It's now in the admin routes, protected by the
`admin-endpoint.secret`.
2021-01-04 15:18:02 +01:00
Eike Kettner
df1fc845e9 Docker: add missing language for tesseract
Closes: #525
2021-01-02 14:15:09 +01:00
totti4ever
5dbd35060a
Fix ocrmypdf containers not being removed after a run
before they would kind of lying around and pile up after a couple of processed items
2020-10-31 19:35:35 +01:00
Malte
69465807c5 fixed wrong timezone because of missing tzdata package
- Now the timezone can be set as expected using TZ env variable
2020-10-27 23:23:38 +01:00
Malte
3d074c5fc9 Bugfixes
- Using a script in `/usr/local/bin ` now to overwrit the default *ocrmypdf* version and thus replaced the approach using a bash function
- Also had to add volume mapping to docker call

**ATTENTION** the path /tmp/docspell-convert:/tmp/docspell-convert must be mapped when starting Docspell's docker image!
2020-10-27 12:37:37 +01:00
Malte
cde7519f24 set default version of OCRmyPDF's docker image to _v11.2.1_, which seems to be the latest stable before _11.3.0_ 2020-10-27 06:57:27 +01:00
Malte
e9db579af6 added environment variable to set preferred OCRmyPDF version when using docker image
- e.g. `- OCRMYPDF_VERSION=v11.2.1`
 - default ist `latest`
2020-10-27 06:30:50 +01:00
Malte
c56f692ff5 (DOCKER) allows to use jbarlow83's official docker image of OCRmyPDF, i.e. use a newer version
- if `/var/run/docker.sock` is found in the docker-container, this feature is activated - if not, nothing changes
 - usage: mount bind `docker.sock` from host by using `-v` or `volumes:`
2020-10-27 06:00:55 +01:00
totti4ever
bcb42920e1
Proper docker files (build from code) - 2.1 (#311)
* new file: base.dockerfile
base docker file for compiling sources

deleted: build-images.sh
replaced by dev-build-images.sh

deleted: build-joex-base.sh
replaced by base / joex dockerfile

changed: consumedir.dockerfile
based on compiled tool-binaries from base image
added health check (basic, check for REST server connection)

new file: dev-build-images.sh
added one build script for all purposes
derives tag from version-file (all snapshot become latest)

new file: dev-push-images.sh
added one push script for all purposes (similar to build)

changed: docker-compose.yml
	* changed regarding entrypoints and commands now being in the images
	* added health checks for 3rd party images (postgres and solr)
	* some minor renaming of the areas

renamed: entrypoint-joex.sh -> joex-entrypoint.sh
	* for better order

renamed: joex-base.dockerfile -> joex.dockerfile
	* also reworked to base on main base image
	* plus renamed for better order

deleted: push-images.sh
	* replaced by dev-push-images.sh

deleted: push-joex-base.sh
	* not necessary anymore

changed: restserver.dockerfile
	* reworked to be base on main base image
	* smaller
	* added health check

* updated docker-compose to new images

* update docker-compose.yml

remove unnecessary network entries

* update docker-compose.yml

added missing volume for postgres

* reverted image naming scheme and added log to docker build

1. go back to local code instead of cloning git
2. added build log to docker image build script, incl. log build times.
Logs can be found in `docker/dev-log` folder
3. added docker docs and new docker build logs folder to .gitignore
4. added

* build docker images from local files instead of cloned remote repo, plus time recording of builds
 - switched way docker images are built from remote git repo to local files (which should be the git repo, but may have local changes)
 - the docker build logs will show the time needed for the single image builds

* reverted deletion of joex base dockerfile

* joex base dockerfile plus smaller improvements
  - separate joex base file
  - added docker hook to improve Docker Hub experience

* updated docker hub build hook

corrected wrong path (base is build context)

* Fix of docker hub build hook again

base path seems to be the dockerfile's folder

* fixed typo in .dockerignore

* added ability to spool log to console instead of file, especially for automated docker builds

* improved logging of build script

* minor tweaks from review (.dockerignore, docker hub hook and an error when using other repos)

* added push of non-base images to automated docker hook

* fixes for docker hub build hook

* fixed/improved docker build hook

* replaced tag-version of untagged versions with SNAPSHOT (was LATEST, which should be used or stable tags only)
plus, made the version tag mandatory for the dockerfiles

* adapted docker build and push scripts for tagged images (using docker automated builds)

* fixed docker build hook
stupid copy & paste mistake...

* minor mistake in build hook

* added validation of matching version numbers for docker automated builds (for non-snapshot builds)

* fixed missing fi in new validity check

* fixed docker build hook validity check

* mixed up version comparison fixed

* relative path error in hook validation

* mixed up version comparison fixed

* test

* fixed error in version matching for docker hook

* test

* improved versioning, so that docker images are v0.00.00

* revert version.sbt

got overwritten by accidence

* reverted version.sbt

* improved environment parameters, especially enabled setting DB params by them

	- additionally added .env file to have the same env variables for all containers

* cleaned up docker-compose.yml to fit public origin repo again

* optimized way db params are set

figured out, we do not need the DB-String to be built at startup, docspell.conf reads also multiple variables.
I still kept the restserver entrypoint, although we do not need it now - it might be helpful in the future or for debugging pruposes

* added restart option  to restart docspell, e.g. after a system reboot - but only if it was running before
2020-10-19 13:56:44 +02:00
Eike Kettner
13daa99933 Update docker and nix setup 2020-09-28 01:10:44 +02:00
Eike Kettner
28a70f56ec Fix joex docker image 2020-09-27 01:20:00 +02:00
Eike Kettner
4451ba0ef3 Configure joex with 1.5g heap in docker compose
Issue: #287
2020-09-26 13:18:54 +02:00
Eike Kettner
29ddcccbba Use a base image for joex containing all the tools 2020-09-09 22:59:34 +02:00
Eike Kettner
dc88fcb960 Update nix and docker setup 2020-09-09 22:31:35 +02:00
Eike Kettner
8e5e198098 Update nix and docker setups 2020-09-08 00:32:17 +02:00
Eike Kettner
d68d076c84 Update nix and docker setups 2020-08-15 00:34:33 +02:00
Eike Kettner
66793080d8 Update docker setup 2020-08-01 19:01:49 +02:00
Eike Kettner
3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.
2020-07-18 17:19:29 +02:00
Eike Kettner
ec7b34ee6f Update nix/nixos and docker setups 2020-06-29 21:01:07 +02:00
Eike Kettner
f883648839 Add missing entrypoint script for docker 2020-06-28 13:50:14 +02:00
Eike Kettner
d3b3c6289b Prepare docker setup for fulltext search 2020-06-28 13:37:39 +02:00
Eike Kettner
41964027d1 Update docker files 2020-06-17 22:28:04 +02:00
Eike Kettner
3d902c3273 Add a docker image for watching a directory 2020-05-25 19:43:06 +02:00
Eike Kettner
0b7cc0ec6b Update nix and docker setups 2020-05-25 17:57:41 +02:00
Eike Kettner
8f46f6b57b Update docker setup 2020-04-30 22:38:53 +02:00
Eike Kettner
5b21a876aa Try provide docker setup 2020-03-31 00:45:43 +02:00