Eike Kettner
ff121d462c
Disable memory intensive tests on travis
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5
Reorganize nlp pipeline and add nlp-unsupported language italian
...
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
54a09861c4
Use model cache with basic annotator
2021-01-17 22:56:33 +01:00
Eike Kettner
4462ebae0f
Resurrect the basic ner classifier
2021-01-17 22:56:33 +01:00
Eike Kettner
a699e87304
Separate ner from classification
2021-01-17 22:56:33 +01:00
Eike Kettner
75986c461f
Fix ner date label boundary reporting
2021-01-10 09:10:39 +01:00
Eike Kettner
fb05e997ab
Provide multiple date suggestions for English
...
Issue: #561
2021-01-10 09:02:26 +01:00
Eike Kettner
53c8d3031d
Skip invalid dates find in texts
...
Fixes : #298
2020-10-02 22:37:15 +02:00
Eike Kettner
c658677032
Autoformat
2020-09-09 00:29:32 +02:00
Eike Kettner
0c97b4ef76
Initial impl of a text classifier based on stanford-nlp
2020-09-02 18:28:14 +02:00
Eike Kettner
96d2f948f2
Use collective's addressbook to configure regexner
2020-08-24 14:40:52 +02:00
Eike Kettner
fdb46da26d
Add french language and upgrade stanford-nlp to 4.0.0
2020-08-23 17:48:42 +02:00
Eike Kettner
9656ba62f4
scalafmtAll
2020-03-26 18:26:00 +01:00
Eike Kettner
8143a4edcc
Adding extraction primitives
2020-02-16 21:37:26 +01:00
Eike Kettner
851ee7ef0f
Reorganize processing code
...
Use separate modules for
- text extraction
- conversion to pdf
- text analysis
2020-02-15 21:25:25 +01:00