Eike Kettner
73b54addc5
Set version to 0.19.0
2021-01-25 09:21:16 +01:00
Eike Kettner
394aeeccb6
Introduce a sql literal and constants in query builder
...
The h2 jdbc driver could not translate the union query in QCollective
when the `kind` was set via a constant value. Using literals works
here. Renamed the corresponding elements in the query builder.
2021-01-25 00:18:24 +01:00
mergify[bot]
6cc9c159d6
Merge pull request #590 from eikek/scan-mailbox-filter
...
Refactor scan mailbox form and add flag for post-processing
2021-01-24 01:06:51 +00:00
Eike Kettner
96612e0e59
Refactor scan mailbox form and add flag for post-processing
...
Mails are filtered once by using an imap search and then by some globs
to filter files and subjects. Imap can search by subject via a
string-contains, but not via globs or patterns (afaik). The subject
filter is applied to all downloaded mail headers. Now for post
processing (moving to some target folder or deleting), it can be
chosen to post-process all "seen" mails or only those that matched all
filters.
2021-01-24 01:46:31 +01:00
Eike Kettner
1b66e2af5c
Fix classifier_settings table
2021-01-23 21:30:26 +01:00
Eike Kettner
c7e850116f
Make the text length limit optional
2021-01-22 23:06:50 +01:00
mergify[bot]
8dd1672c8c
Merge pull request #583 from eikek/fix-baseurl-setting
...
Render baseurl without trailing slash
2021-01-21 23:44:14 +00:00
mergify[bot]
38e0a50942
Merge pull request #582 from eikek/delete-org-fix
...
Fix deleting organization
2021-01-21 22:56:56 +00:00
Eike Kettner
0ec620fcf0
Render baseurl without trailing slash
...
The webapp expects it like this currently, because the url is only a
string.
2021-01-21 21:42:08 +01:00
Eike Kettner
f4a03e7c69
Fix deleting organization
...
The foreign key in person must be resetted.
2021-01-21 21:27:02 +01:00
Eike Kettner
4cba96f390
Always return classifier results as suggestion
...
The classifier results are spliced into the suggestion list at second
place. When linking they are only used if nlp didn't find anything.
2021-01-21 21:05:28 +01:00
Eike Kettner
9957c3267e
Add constraints from config to classifier training
...
For large and/or many documents, training the classifier can lead to
OOM errors. Some limits have been set by default.
2021-01-21 17:46:39 +01:00
Eike Kettner
363cf5aef0
Quote names in sql changesets
2021-01-21 00:22:58 +01:00
Eike Kettner
38387e00a0
Fix mariadb migration
2021-01-21 00:22:53 +01:00
Eike Kettner
a6c31be22f
Update documentation
2021-01-20 22:47:15 +01:00
Eike Kettner
85ddc61d9d
Move date proposal setting to nlp config
2021-01-20 19:17:29 +01:00
Eike Kettner
5d366c3bd6
Make labels in classifier settings more clear
2021-01-20 01:05:59 +01:00
Eike Kettner
b12d965223
Improve logging
2021-01-20 00:40:58 +01:00
Eike Kettner
27c24c128d
Store tags guessed with classifier in database
2021-01-20 00:30:40 +01:00
Eike Kettner
9d83cb7fe4
Store item based proposals in separate table
...
Classifier don't work on each attachment, but on all. So the results
must not be stored at an attachment. This reverts some previous
changes to put the classifier results for item entities into its own
table.
2021-01-19 23:48:09 +01:00
Eike Kettner
3ff9284a64
Return classifier results as suggestions
2021-01-19 23:13:51 +01:00
Eike Kettner
75573c905e
Use classifier results as fallback when linking proposed metadata
2021-01-19 23:13:34 +01:00
Eike Kettner
8455d1badf
Lookup results from classifier
...
The model may be out of date, data may change. Then it should be
looked up to fetch the id to be compatible with next stages.
2021-01-19 22:56:01 +01:00
Eike Kettner
1cd3441462
Run classifier for item entities (concerned, correspondent)
...
Store the results separately from nlp results in attachment metadata.
2021-01-19 22:08:29 +01:00
Eike Kettner
d124f0c1a9
Rename db changeset
...
It's not just a fix, but adds new things
2021-01-19 22:08:29 +01:00
Eike Kettner
5c487ef7a9
Refactor running classifier in text analysis
2021-01-19 21:30:02 +01:00
Eike Kettner
99dcaae66b
Learn classifiers for item entities
...
Learns classifiers for concerned and correspondent entities. This can
be used as an alternative to or after nlp.
2021-01-19 20:54:47 +01:00
Eike Kettner
a6f29153c4
Control what tag categories to use for auto-tagging
2021-01-19 01:20:13 +01:00
Eike Kettner
cce8878898
Exclude tags w/o category from classifying; remove obsolete models
2021-01-18 21:51:49 +01:00
Eike Kettner
3e28ce1254
Add the sql concat function to query builder
2021-01-18 21:51:45 +01:00
Eike Kettner
249f9e6e2a
Extend guessing tags to all tag categories
2021-01-18 21:51:45 +01:00
Eike Kettner
3f75af0807
Add 9 more lanugages to the list of document lanugages
2021-01-18 17:41:40 +01:00
Eike Kettner
94bb18c152
Refactor solr language fields
2021-01-18 17:41:40 +01:00
Eike Kettner
26dff18ae0
Add spanish as an example
...
Adding a new language without nlp requires now only to fill out the
pieces:
- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
2021-01-18 17:41:40 +01:00
Eike Kettner
360cad3304
Refactoring solr/fts migration
...
When re-indexing everything, skip intermediate populating the index
and do this as the very last step.
Parameterize adding new fields by their language.
2021-01-18 17:41:40 +01:00
Eike Kettner
ff121d462c
Disable memory intensive tests on travis
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5
Reorganize nlp pipeline and add nlp-unsupported language italian
...
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
a70e9ab614
Store used language for processing on attachmentmeta
...
Issue: #570
2021-01-17 22:56:33 +01:00
Eike Kettner
6cf3f9be5a
Fix joex version endpoint in spec
2021-01-17 22:56:33 +01:00
Eike Kettner
aa937797be
Choose nlp mode in config file
2021-01-17 22:56:33 +01:00
Eike Kettner
54a09861c4
Use model cache with basic annotator
2021-01-17 22:56:33 +01:00
Eike Kettner
a77f67d73a
Make pipeline cache generic to be used with BasicCRFAnnotator
2021-01-17 22:56:33 +01:00
Eike Kettner
4462ebae0f
Resurrect the basic ner classifier
2021-01-17 22:56:33 +01:00
Eike Kettner
a699e87304
Separate ner from classification
2021-01-17 22:56:33 +01:00
Eike Kettner
f02f15e5bd
Move blocker into constructor of text analyser
2021-01-17 22:56:33 +01:00
Eike Kettner
b2b8ad625a
scalafmt
2021-01-17 20:11:58 +01:00
Eike Kettner
f0f0e6e0d4
Search for categories case-insensitive
...
The string was already lowercased, but the comparison was not.
Fixes #568
2021-01-17 20:10:24 +01:00
Eike Kettner
623a61dbb6
Introduce a lowerEq operator to the query builder
2021-01-17 20:10:00 +01:00
Eike Kettner
54bd75e99e
Set version to 0.19.0-SNAPSHOT
2021-01-11 23:27:47 +01:00
Eike Kettner
0d1b55a205
Set version to 0.18.0
2021-01-11 22:39:40 +01:00