Commit Graph

66 Commits

Author SHA1 Message Date
Eike Kettner
c9bd57592b Don't use test data if there is just one config
If classifier models cannot be compared, there is no reason to test.
2020-09-07 20:02:50 +02:00
Eike Kettner
316b490008 Implement learning a text classifier from collective data 2020-09-02 18:28:14 +02:00
Eike Kettner
0c97b4ef76 Initial impl of a text classifier based on stanford-nlp 2020-09-02 18:28:14 +02:00
Eike Kettner
96d2f948f2 Use collective's addressbook to configure regexner 2020-08-24 14:40:52 +02:00
Eike Kettner
8628a0a8b3 Allow configuring stanford-ner and cache based on collective 2020-08-24 10:55:59 +02:00
Eike Kettner
fdb46da26d Add french language and upgrade stanford-nlp to 4.0.0 2020-08-23 17:48:42 +02:00
Eike Kettner
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
Eike Kettner
897d91475e Update scalafmt-core to 2.6.0 2020-06-17 19:53:56 +02:00
Eike Kettner
075b665c68 Add some more tlds to look for 2020-05-24 11:48:49 +02:00
Eike Kettner
5e6ce1737c Change recognizing dates with short years
Short years are now added to the current centure (2000) such that date
strings like 12/26/11 result in 12/26/2011 and not 12/26/1911.
2020-05-17 11:58:51 +02:00
Eike Kettner
c41cdeefec Update scalafmt to 2.5.1 + scalafmtAll 2020-05-04 23:53:57 +02:00
Eike Kettner
6a1297fc95 Add a limit for text analysis 2020-03-27 22:54:49 +01:00
Eike Kettner
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
Eike Kettner
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
Eike Kettner
8143a4edcc Adding extraction primitives 2020-02-16 21:37:26 +01:00
Eike Kettner
851ee7ef0f Reorganize processing code
Use separate modules for

- text extraction
- conversion to pdf
- text analysis
2020-02-15 21:25:25 +01:00