Commit Graph

20 Commits

Author SHA1 Message Date
Eike Kettner
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
a563ba33e7 Add new joex option to nix module 2021-01-06 23:06:13 +01:00
Eike Kettner
5681581bf8 Add missing configs to nix modules 2020-12-14 14:51:56 +01:00
Eike Kettner
acbfb9464f Update nix module with new config values 2020-11-15 00:01:48 +01:00
Eike Kettner
9547d6ffac Allo setting jvm arguments in nixos modules 2020-09-08 18:07:04 +02:00
Eike Kettner
afbe9554b6 Update joex nixos module 2020-09-02 22:23:12 +02:00
Eike Kettner
3473cbb773 Use collective data with NER annotation 2020-08-25 20:40:44 +02:00
Eike Kettner
3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.
2020-07-18 17:19:29 +02:00
Eike Kettner
7193279053 Prepare nixos setup for full-text-search and new consumedir settings 2020-06-28 13:49:47 +02:00
Eike Kettner
d79ae6233a Restrict proposals for due date
Avoid dates too far in the future.
2020-06-26 16:58:17 +02:00
Eike Kettner
41286959b2 Updating nix modules with new config options 2020-06-25 23:56:44 +02:00
Eike Kettner
b4da523347 Update nixos modules with new config options 2020-05-25 15:32:03 +02:00
Eike Kettner
485d995277 Add list-id option to nixos module 2020-04-30 21:34:31 +02:00
Eike Kettner
6a1297fc95 Add a limit for text analysis 2020-03-27 22:54:49 +01:00
Eike Kettner
74fb0d994f Add new options to nix module 2020-03-27 20:16:18 +01:00
Eike Kettner
de3e07a77c Nix modules: change docspell user to be a normal user
Seems that unoconv requires a shell.
2020-03-01 21:34:11 +01:00
Eike Kettner
1a0f176019 Add an unoconv listener to joex nixos module 2020-03-01 01:24:07 +01:00
Eike Kettner
ec419c7bfd Adopt nix modules to new config 2020-02-22 12:40:56 +01:00
Eike Kettner
c0f39d6497 Improve nix files
List available versions; refactor modules to reuse default values.
2020-01-22 23:33:42 +01:00
Eike Kettner
23af8acff8 Add support for integrating into nix/nixos 2020-01-20 00:21:15 +01:00