Commit Graph

77 Commits

Author SHA1 Message Date
Eike Kettner
75986c461f Fix ner date label boundary reporting 2021-01-10 09:10:39 +01:00
Eike Kettner
fb05e997ab Provide multiple date suggestions for English
Issue: #561
2021-01-10 09:02:26 +01:00
Eike Kettner
716252721c Fix cache clearing
It must be cancelled when obtaining a pipeline.
2021-01-07 23:31:01 +01:00
Eike Kettner
a670bbb6c2 Make idle interval when clearing nlp cache configurable 2021-01-06 23:03:00 +01:00
Eike Kettner
73a9572835 Poc for clearing stanford pipeline after some idle time 2021-01-05 23:56:20 +01:00
Tammo van Lessen
e9347176bd
Fixes an off-by-one classic to also accept dates in January 2020-11-28 00:43:35 +01:00
Eike Kettner
cf6e63785d Fix potential index-out-of-bounds error in classifier
The stanford library expects a non-empty text.
2020-11-09 00:04:51 +01:00
Eike Kettner
3f697f51aa Autoformat 2020-10-06 23:31:09 +02:00
Eike Kettner
53c8d3031d Skip invalid dates find in texts
Fixes: #298
2020-10-02 22:37:15 +02:00
Eike Kettner
c658677032 Autoformat 2020-09-09 00:29:32 +02:00
Eike Kettner
97757876d5 Fix formatting 2020-09-08 00:47:42 +02:00
Eike Kettner
c9bd57592b Don't use test data if there is just one config
If classifier models cannot be compared, there is no reason to test.
2020-09-07 20:02:50 +02:00
Eike Kettner
316b490008 Implement learning a text classifier from collective data 2020-09-02 18:28:14 +02:00
Eike Kettner
0c97b4ef76 Initial impl of a text classifier based on stanford-nlp 2020-09-02 18:28:14 +02:00
Eike Kettner
96d2f948f2 Use collective's addressbook to configure regexner 2020-08-24 14:40:52 +02:00
Eike Kettner
8628a0a8b3 Allow configuring stanford-ner and cache based on collective 2020-08-24 10:55:59 +02:00
Eike Kettner
fdb46da26d Add french language and upgrade stanford-nlp to 4.0.0 2020-08-23 17:48:42 +02:00
Eike Kettner
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
Eike Kettner
897d91475e Update scalafmt-core to 2.6.0 2020-06-17 19:53:56 +02:00
Eike Kettner
075b665c68 Add some more tlds to look for 2020-05-24 11:48:49 +02:00
Eike Kettner
5e6ce1737c Change recognizing dates with short years
Short years are now added to the current centure (2000) such that date
strings like 12/26/11 result in 12/26/2011 and not 12/26/1911.
2020-05-17 11:58:51 +02:00
Eike Kettner
c41cdeefec Update scalafmt to 2.5.1 + scalafmtAll 2020-05-04 23:53:57 +02:00
Eike Kettner
6a1297fc95 Add a limit for text analysis 2020-03-27 22:54:49 +01:00
Eike Kettner
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
Eike Kettner
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
Eike Kettner
8143a4edcc Adding extraction primitives 2020-02-16 21:37:26 +01:00
Eike Kettner
851ee7ef0f Reorganize processing code
Use separate modules for

- text extraction
- conversion to pdf
- text analysis
2020-02-15 21:25:25 +01:00