Commit Graph

8 Commits

Author SHA1 Message Date
Eike Kettner
9991ad5fcc Add latvian language 2021-03-09 00:23:17 +01:00
Eike Kettner
3f75af0807 Add 9 more lanugages to the list of document lanugages 2021-01-18 17:41:40 +01:00
Eike Kettner
26dff18ae0 Add spanish as an example
Adding a new language without nlp requires now only to fill out the
pieces:

- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
df1fc845e9 Docker: add missing language for tesseract
Closes: #525
2021-01-02 14:15:09 +01:00
totti4ever
bcb42920e1
Proper docker files (build from code) - 2.1 (#311)
* new file: base.dockerfile
base docker file for compiling sources

deleted: build-images.sh
replaced by dev-build-images.sh

deleted: build-joex-base.sh
replaced by base / joex dockerfile

changed: consumedir.dockerfile
based on compiled tool-binaries from base image
added health check (basic, check for REST server connection)

new file: dev-build-images.sh
added one build script for all purposes
derives tag from version-file (all snapshot become latest)

new file: dev-push-images.sh
added one push script for all purposes (similar to build)

changed: docker-compose.yml
	* changed regarding entrypoints and commands now being in the images
	* added health checks for 3rd party images (postgres and solr)
	* some minor renaming of the areas

renamed: entrypoint-joex.sh -> joex-entrypoint.sh
	* for better order

renamed: joex-base.dockerfile -> joex.dockerfile
	* also reworked to base on main base image
	* plus renamed for better order

deleted: push-images.sh
	* replaced by dev-push-images.sh

deleted: push-joex-base.sh
	* not necessary anymore

changed: restserver.dockerfile
	* reworked to be base on main base image
	* smaller
	* added health check

* updated docker-compose to new images

* update docker-compose.yml

remove unnecessary network entries

* update docker-compose.yml

added missing volume for postgres

* reverted image naming scheme and added log to docker build

1. go back to local code instead of cloning git
2. added build log to docker image build script, incl. log build times.
Logs can be found in `docker/dev-log` folder
3. added docker docs and new docker build logs folder to .gitignore
4. added

* build docker images from local files instead of cloned remote repo, plus time recording of builds
 - switched way docker images are built from remote git repo to local files (which should be the git repo, but may have local changes)
 - the docker build logs will show the time needed for the single image builds

* reverted deletion of joex base dockerfile

* joex base dockerfile plus smaller improvements
  - separate joex base file
  - added docker hook to improve Docker Hub experience

* updated docker hub build hook

corrected wrong path (base is build context)

* Fix of docker hub build hook again

base path seems to be the dockerfile's folder

* fixed typo in .dockerignore

* added ability to spool log to console instead of file, especially for automated docker builds

* improved logging of build script

* minor tweaks from review (.dockerignore, docker hub hook and an error when using other repos)

* added push of non-base images to automated docker hook

* fixes for docker hub build hook

* fixed/improved docker build hook

* replaced tag-version of untagged versions with SNAPSHOT (was LATEST, which should be used or stable tags only)
plus, made the version tag mandatory for the dockerfiles

* adapted docker build and push scripts for tagged images (using docker automated builds)

* fixed docker build hook
stupid copy & paste mistake...

* minor mistake in build hook

* added validation of matching version numbers for docker automated builds (for non-snapshot builds)

* fixed missing fi in new validity check

* fixed docker build hook validity check

* mixed up version comparison fixed

* relative path error in hook validation

* mixed up version comparison fixed

* test

* fixed error in version matching for docker hook

* test

* improved versioning, so that docker images are v0.00.00

* revert version.sbt

got overwritten by accidence

* reverted version.sbt

* improved environment parameters, especially enabled setting DB params by them

	- additionally added .env file to have the same env variables for all containers

* cleaned up docker-compose.yml to fit public origin repo again

* optimized way db params are set

figured out, we do not need the DB-String to be built at startup, docspell.conf reads also multiple variables.
I still kept the restserver entrypoint, although we do not need it now - it might be helpful in the future or for debugging pruposes

* added restart option  to restart docspell, e.g. after a system reboot - but only if it was running before
2020-10-19 13:56:44 +02:00
Eike Kettner
28a70f56ec Fix joex docker image 2020-09-27 01:20:00 +02:00
Eike Kettner
29ddcccbba Use a base image for joex containing all the tools 2020-09-09 22:59:34 +02:00