f01646aeb5
Reorganize nlp pipeline and add nlp-unsupported language italian
...
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
a70e9ab614
Store used language for processing on attachmentmeta
...
Issue: #570
2021-01-17 22:56:33 +01:00
aa937797be
Choose nlp mode in config file
2021-01-17 22:56:33 +01:00
a699e87304
Separate ner from classification
2021-01-17 22:56:33 +01:00
f02f15e5bd
Move blocker into constructor of text analyser
2021-01-17 22:56:33 +01:00
d77b5855e4
Set default pool-size to 1
2021-01-11 22:30:59 +01:00
bddafa7d28
Fix looping over already seen mails when they are skipped
...
When skipping mails due to a filter, it must still enter the
post-handling step. Otherwise it will be seen again on next run.
Issue: #551
2021-01-09 15:07:18 +01:00
d712f8303d
Make glob matching case-insensitive by default
2021-01-09 13:23:15 +01:00
a670bbb6c2
Make idle interval when clearing nlp cache configurable
2021-01-06 23:03:00 +01:00
b08e88cd69
Add (inofficial) routes to get system information
2021-01-05 20:54:53 +01:00
611e480eb4
Use more prominent log line to indicate start of processing
...
Issue: #530
2021-01-02 21:47:54 +01:00
97dfcece97
Fix duplicate check on restarts
...
Issue: #530
2021-01-02 21:18:05 +01:00
2dff686fa0
Introduce unit condition
2020-12-15 21:03:47 +01:00
80406cabc2
Refactoring some code into separate files
2020-12-15 21:03:47 +01:00
5e2c5d2a50
Extends query builder
2020-12-15 21:03:46 +01:00
35c62049f5
Start converting QItem
2020-12-15 21:03:46 +01:00
613696539f
Minor refactorings
2020-12-15 21:03:46 +01:00
e3f6892abd
Convert job record
2020-12-15 21:03:46 +01:00
3cef932ccd
Convert more records
2020-12-15 21:03:46 +01:00
10b49fccf8
Converting user and userimap records
2020-12-15 21:03:46 +01:00
f5ae389eea
Cleanup remember-me tokens periodically
2020-12-04 17:59:25 +01:00
290989f67f
Reorder correspondent person suggestion based on org relationship
2020-12-01 23:39:45 +01:00
3fabe0a582
Update to Scala 2.13.4
2020-11-27 20:26:24 +01:00
5fe532001b
Allow to specify document lanugage with the request
2020-11-23 20:49:01 +01:00
5034e12bec
Add a subject filter to scan-mailbox args
2020-11-13 23:15:20 +01:00
e5ce1fd45f
Merge pull request #437 from eikek/upload-improvements
...
Upload improvements
2020-11-12 22:58:08 +00:00
4fd6e02ec0
Improve glob and filter archive entries
2020-11-11 21:01:23 +01:00
27eb5d70de
Apply given tags in processing step
...
Issue: #346
2020-11-11 21:01:23 +01:00
55a6f7aaf6
Add more properties to upload meta data
2020-11-11 21:01:23 +01:00
746e04c624
Improve logging when creating preview images
2020-11-10 22:25:46 +01:00
10305bc82d
Minor improvements
2020-11-09 21:16:53 +01:00
29455d638c
Add startup task to find page counts of existing files
2020-11-09 20:35:35 +01:00
a77f34b7ba
Add a processing step to retrieve page counts
2020-11-09 11:08:24 +01:00
f4e50c5229
Provide endpoints to submit tasks to re-generate previews
...
The scaling factor can be given in the config file. When this changes,
images can be regenerated via POSTing to certain endpoints. It is
possible to regenerate just one attachment preview or all within a
collective.
2020-11-09 09:00:02 +01:00
6037b54959
Don't fail processing if generating preview fails
2020-11-09 00:05:11 +01:00
709848244c
Create tasks to generate all previews
...
There is a task to generate preview images per attachment. It can
either add them (if not present yet) or overwrite them (e.g. some
config has changed).
There is a task that selects all attachments without previews and
submits a task to create it. This is submitted on start automatically
to generate previews for all existing attachments.
2020-11-08 23:46:02 +01:00
7ba6baf6f0
Make preview image smaller
2020-11-08 15:12:56 +01:00
6db5c39d78
Fix converted filename
...
Mark it by default with a string from the config file.
Issue: 397
2020-11-08 09:45:03 +01:00
ef7cb4e779
Create a preview image of all files during processing
2020-11-08 01:25:59 +01:00
ab1139523a
Let the convert-all task retry when pdf conversion fails
2020-10-26 23:39:26 +01:00
b59696a9d3
Make sure to only remove/retry items in premature states
2020-10-26 23:39:26 +01:00
26e89bf84e
Edit org/person/equipment of multiple items
2020-10-26 13:35:47 +01:00
2e6026b817
Edit dates of multiple items
2020-10-26 13:16:03 +01:00
3c0b86cb19
Fix regex patterns used for NER
...
Patterns are split on whitespace by the nlp library and then compiled,
so each "word" must be a valid regex.
Fixes : #356
2020-10-21 00:55:14 +02:00
3f697f51aa
Autoformat
2020-10-06 23:31:09 +02:00
d4354b8b49
Skip pdf conversion if a converted file exists
...
For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step.
2020-10-02 17:39:39 +02:00
b6f23b038a
Fix finding attachments for retries
...
The attachments to process again must be searched in sources and
archives, too.
2020-10-02 17:39:34 +02:00
5e21552358
Don't do duplicate check on retries
2020-10-02 16:50:52 +02:00
f6f63000be
Prepend a duplicate check when uploading files
2020-09-23 23:37:00 +02:00
c658677032
Autoformat
2020-09-09 00:29:32 +02:00