Eike Kettner
|
a699e87304
|
Separate ner from classification
|
2021-01-17 22:56:33 +01:00 |
|
Eike Kettner
|
f02f15e5bd
|
Move blocker into constructor of text analyser
|
2021-01-17 22:56:33 +01:00 |
|
Eike Kettner
|
b2b8ad625a
|
scalafmt
|
2021-01-17 20:11:58 +01:00 |
|
Eike Kettner
|
75986c461f
|
Fix ner date label boundary reporting
|
2021-01-10 09:10:39 +01:00 |
|
Eike Kettner
|
fb05e997ab
|
Provide multiple date suggestions for English
Issue: #561
|
2021-01-10 09:02:26 +01:00 |
|
Eike Kettner
|
716252721c
|
Fix cache clearing
It must be cancelled when obtaining a pipeline.
|
2021-01-07 23:31:01 +01:00 |
|
Eike Kettner
|
a670bbb6c2
|
Make idle interval when clearing nlp cache configurable
|
2021-01-06 23:03:00 +01:00 |
|
Eike Kettner
|
73a9572835
|
Poc for clearing stanford pipeline after some idle time
|
2021-01-05 23:56:20 +01:00 |
|
Tammo van Lessen
|
e9347176bd
|
Fixes an off-by-one classic to also accept dates in January
|
2020-11-28 00:43:35 +01:00 |
|
Eike Kettner
|
cf6e63785d
|
Fix potential index-out-of-bounds error in classifier
The stanford library expects a non-empty text.
|
2020-11-09 00:04:51 +01:00 |
|
Eike Kettner
|
3f697f51aa
|
Autoformat
|
2020-10-06 23:31:09 +02:00 |
|
Eike Kettner
|
53c8d3031d
|
Skip invalid dates find in texts
Fixes: #298
|
2020-10-02 22:37:15 +02:00 |
|
Eike Kettner
|
c658677032
|
Autoformat
|
2020-09-09 00:29:32 +02:00 |
|
Eike Kettner
|
97757876d5
|
Fix formatting
|
2020-09-08 00:47:42 +02:00 |
|
Eike Kettner
|
c9bd57592b
|
Don't use test data if there is just one config
If classifier models cannot be compared, there is no reason to test.
|
2020-09-07 20:02:50 +02:00 |
|
Eike Kettner
|
316b490008
|
Implement learning a text classifier from collective data
|
2020-09-02 18:28:14 +02:00 |
|
Eike Kettner
|
0c97b4ef76
|
Initial impl of a text classifier based on stanford-nlp
|
2020-09-02 18:28:14 +02:00 |
|
Eike Kettner
|
96d2f948f2
|
Use collective's addressbook to configure regexner
|
2020-08-24 14:40:52 +02:00 |
|
Eike Kettner
|
8628a0a8b3
|
Allow configuring stanford-ner and cache based on collective
|
2020-08-24 10:55:59 +02:00 |
|
Eike Kettner
|
fdb46da26d
|
Add french language and upgrade stanford-nlp to 4.0.0
|
2020-08-23 17:48:42 +02:00 |
|
Eike Kettner
|
347a029af8
|
Scalafix organize-imports
|
2020-06-28 21:20:47 +02:00 |
|
Eike Kettner
|
897d91475e
|
Update scalafmt-core to 2.6.0
|
2020-06-17 19:53:56 +02:00 |
|
Eike Kettner
|
075b665c68
|
Add some more tlds to look for
|
2020-05-24 11:48:49 +02:00 |
|
Eike Kettner
|
5e6ce1737c
|
Change recognizing dates with short years
Short years are now added to the current centure (2000) such that date
strings like 12/26/11 result in 12/26/2011 and not 12/26/1911.
|
2020-05-17 11:58:51 +02:00 |
|
Eike Kettner
|
c41cdeefec
|
Update scalafmt to 2.5.1 + scalafmtAll
|
2020-05-04 23:53:57 +02:00 |
|
Eike Kettner
|
6a1297fc95
|
Add a limit for text analysis
|
2020-03-27 22:54:49 +01:00 |
|
Eike Kettner
|
9656ba62f4
|
scalafmtAll
|
2020-03-26 18:26:00 +01:00 |
|
Eike Kettner
|
2f87065b2e
|
sbt scalafmtAll
|
2020-02-25 20:55:00 +01:00 |
|
Eike Kettner
|
8143a4edcc
|
Adding extraction primitives
|
2020-02-16 21:37:26 +01:00 |
|
Eike Kettner
|
851ee7ef0f
|
Reorganize processing code
Use separate modules for
- text extraction
- conversion to pdf
- text analysis
|
2020-02-15 21:25:25 +01:00 |
|