The missing dash resulted in solr sending a DNS query each minute that returned NxDomain.
In my case they looked like this: "f.base.domain" with base.domain being the base domain set for local DNS.
Since this file cannot be changed inside the image, and people need to
specify a new file or env variables, it doesn't make sense to add it.
Also if it is present, it is preferred to the env variables.
The docker bash scripts try to get the version from sbt, without
calling sbt but reading the files. This was relying on a specific
position. It is now a bit more robust.
To make it work as before, some env variables are necessary to be
available to the consumedir container. It's only needed there, but
since the env file exists, they are now listed there.
Also partially reverts the consumedir.dockerfile. Using the base-image
makes sure that all build images are created from the same sources.
Adding a new language without nlp requires now only to fill out the
pieces:
- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
Container name can't be hard coded and each joex instance needs a unique
name. Since Docker always sets the `HOSTNAME` variable and these are
unique, we can just interpolate the hostname into the joex app
identifier, to avoid creating multiple config files.