Commit Graph

303 Commits

Author SHA1 Message Date
611e480eb4 Use more prominent log line to indicate start of processing
Issue: #530
2021-01-02 21:47:54 +01:00
97dfcece97 Fix duplicate check on restarts
Issue: #530
2021-01-02 21:18:05 +01:00
2dff686fa0 Introduce unit condition 2020-12-15 21:03:47 +01:00
80406cabc2 Refactoring some code into separate files 2020-12-15 21:03:47 +01:00
5e2c5d2a50 Extends query builder 2020-12-15 21:03:46 +01:00
35c62049f5 Start converting QItem 2020-12-15 21:03:46 +01:00
613696539f Minor refactorings 2020-12-15 21:03:46 +01:00
e3f6892abd Convert job record 2020-12-15 21:03:46 +01:00
3cef932ccd Convert more records 2020-12-15 21:03:46 +01:00
10b49fccf8 Converting user and userimap records 2020-12-15 21:03:46 +01:00
f5ae389eea Cleanup remember-me tokens periodically 2020-12-04 17:59:25 +01:00
290989f67f Reorder correspondent person suggestion based on org relationship 2020-12-01 23:39:45 +01:00
3fabe0a582 Update to Scala 2.13.4 2020-11-27 20:26:24 +01:00
5fe532001b Allow to specify document lanugage with the request 2020-11-23 20:49:01 +01:00
5034e12bec Add a subject filter to scan-mailbox args 2020-11-13 23:15:20 +01:00
e5ce1fd45f Merge pull request #437 from eikek/upload-improvements
Upload improvements
2020-11-12 22:58:08 +00:00
4fd6e02ec0 Improve glob and filter archive entries 2020-11-11 21:01:23 +01:00
27eb5d70de Apply given tags in processing step
Issue: #346
2020-11-11 21:01:23 +01:00
55a6f7aaf6 Add more properties to upload meta data 2020-11-11 21:01:23 +01:00
746e04c624 Improve logging when creating preview images 2020-11-10 22:25:46 +01:00
10305bc82d Minor improvements 2020-11-09 21:16:53 +01:00
29455d638c Add startup task to find page counts of existing files 2020-11-09 20:35:35 +01:00
a77f34b7ba Add a processing step to retrieve page counts 2020-11-09 11:08:24 +01:00
f4e50c5229 Provide endpoints to submit tasks to re-generate previews
The scaling factor can be given in the config file. When this changes,
images can be regenerated via POSTing to certain endpoints. It is
possible to regenerate just one attachment preview or all within a
collective.
2020-11-09 09:00:02 +01:00
6037b54959 Don't fail processing if generating preview fails 2020-11-09 00:05:11 +01:00
709848244c Create tasks to generate all previews
There is a task to generate preview images per attachment. It can
either add them (if not present yet) or overwrite them (e.g. some
config has changed).

There is a task that selects all attachments without previews and
submits a task to create it. This is submitted on start automatically
to generate previews for all existing attachments.
2020-11-08 23:46:02 +01:00
7ba6baf6f0 Make preview image smaller 2020-11-08 15:12:56 +01:00
6db5c39d78 Fix converted filename
Mark it by default with a string from the config file.

Issue: 397
2020-11-08 09:45:03 +01:00
ef7cb4e779 Create a preview image of all files during processing 2020-11-08 01:25:59 +01:00
ab1139523a Let the convert-all task retry when pdf conversion fails 2020-10-26 23:39:26 +01:00
b59696a9d3 Make sure to only remove/retry items in premature states 2020-10-26 23:39:26 +01:00
26e89bf84e Edit org/person/equipment of multiple items 2020-10-26 13:35:47 +01:00
2e6026b817 Edit dates of multiple items 2020-10-26 13:16:03 +01:00
3c0b86cb19 Fix regex patterns used for NER
Patterns are split on whitespace by the nlp library and then compiled,
so each "word" must be a valid regex.

Fixes: #356
2020-10-21 00:55:14 +02:00
3f697f51aa Autoformat 2020-10-06 23:31:09 +02:00
d4354b8b49 Skip pdf conversion if a converted file exists
For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step.
2020-10-02 17:39:39 +02:00
b6f23b038a Fix finding attachments for retries
The attachments to process again must be searched in sources and
archives, too.
2020-10-02 17:39:34 +02:00
5e21552358 Don't do duplicate check on retries 2020-10-02 16:50:52 +02:00
f6f63000be Prepend a duplicate check when uploading files 2020-09-23 23:37:00 +02:00
c658677032 Autoformat 2020-09-09 00:29:32 +02:00
76ccfb8a81 Only learn from confirmed items
Text classification should only learn from confirmed items. Log if
classification is disabled when processing an item.
2020-09-07 13:04:40 +02:00
4309bd8dfd Some cleanup 2020-09-02 21:22:30 +02:00
237b960625 Guess a tag on item processing using a trained model if available 2020-09-02 18:28:14 +02:00
316b490008 Implement learning a text classifier from collective data 2020-09-02 18:28:14 +02:00
68bb65572b Integrate learn-classifier task into the app 2020-09-02 18:28:14 +02:00
0c97b4ef76 Initial impl of a text classifier based on stanford-nlp 2020-09-02 18:28:14 +02:00
8c4f2e702b Add classifier settings 2020-09-02 18:28:14 +02:00
3473cbb773 Use collective data with NER annotation 2020-08-25 20:40:44 +02:00
96d2f948f2 Use collective's addressbook to configure regexner 2020-08-24 14:40:52 +02:00
8628a0a8b3 Allow configuring stanford-ner and cache based on collective 2020-08-24 10:55:59 +02:00