Commit Graph

15 Commits

Author SHA1 Message Date
ff121d462c Disable memory intensive tests on travis 2021-01-18 17:41:40 +01:00
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
54a09861c4 Use model cache with basic annotator 2021-01-17 22:56:33 +01:00
4462ebae0f Resurrect the basic ner classifier 2021-01-17 22:56:33 +01:00
a699e87304 Separate ner from classification 2021-01-17 22:56:33 +01:00
75986c461f Fix ner date label boundary reporting 2021-01-10 09:10:39 +01:00
fb05e997ab Provide multiple date suggestions for English
Issue: #561
2021-01-10 09:02:26 +01:00
53c8d3031d Skip invalid dates find in texts
Fixes: #298
2020-10-02 22:37:15 +02:00
c658677032 Autoformat 2020-09-09 00:29:32 +02:00
0c97b4ef76 Initial impl of a text classifier based on stanford-nlp 2020-09-02 18:28:14 +02:00
96d2f948f2 Use collective's addressbook to configure regexner 2020-08-24 14:40:52 +02:00
fdb46da26d Add french language and upgrade stanford-nlp to 4.0.0 2020-08-23 17:48:42 +02:00
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
8143a4edcc Adding extraction primitives 2020-02-16 21:37:26 +01:00
851ee7ef0f Reorganize processing code
Use separate modules for

- text extraction
- conversion to pdf
- text analysis
2020-02-15 21:25:25 +01:00