Eike Kettner
a6f29153c4
Control what tag categories to use for auto-tagging
2021-01-19 01:20:13 +01:00
Eike Kettner
cce8878898
Exclude tags w/o category from classifying; remove obsolete models
2021-01-18 21:51:49 +01:00
Eike Kettner
249f9e6e2a
Extend guessing tags to all tag categories
2021-01-18 21:51:45 +01:00
Eike Kettner
360cad3304
Refactoring solr/fts migration
...
When re-indexing everything, skip intermediate populating the index
and do this as the very last step.
Parameterize adding new fields by their language.
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5
Reorganize nlp pipeline and add nlp-unsupported language italian
...
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
a70e9ab614
Store used language for processing on attachmentmeta
...
Issue: #570
2021-01-17 22:56:33 +01:00
Eike Kettner
aa937797be
Choose nlp mode in config file
2021-01-17 22:56:33 +01:00
Eike Kettner
a699e87304
Separate ner from classification
2021-01-17 22:56:33 +01:00
Eike Kettner
f02f15e5bd
Move blocker into constructor of text analyser
2021-01-17 22:56:33 +01:00
Eike Kettner
d77b5855e4
Set default pool-size to 1
2021-01-11 22:30:59 +01:00
Eike Kettner
bddafa7d28
Fix looping over already seen mails when they are skipped
...
When skipping mails due to a filter, it must still enter the
post-handling step. Otherwise it will be seen again on next run.
Issue: #551
2021-01-09 15:07:18 +01:00
Eike Kettner
d712f8303d
Make glob matching case-insensitive by default
2021-01-09 13:23:15 +01:00
Eike Kettner
a670bbb6c2
Make idle interval when clearing nlp cache configurable
2021-01-06 23:03:00 +01:00
Eike Kettner
b08e88cd69
Add (inofficial) routes to get system information
2021-01-05 20:54:53 +01:00
Eike Kettner
611e480eb4
Use more prominent log line to indicate start of processing
...
Issue: #530
2021-01-02 21:47:54 +01:00
Eike Kettner
97dfcece97
Fix duplicate check on restarts
...
Issue: #530
2021-01-02 21:18:05 +01:00
Eike Kettner
2dff686fa0
Introduce unit condition
2020-12-15 21:03:47 +01:00
Eike Kettner
80406cabc2
Refactoring some code into separate files
2020-12-15 21:03:47 +01:00
Eike Kettner
5e2c5d2a50
Extends query builder
2020-12-15 21:03:46 +01:00
Eike Kettner
35c62049f5
Start converting QItem
2020-12-15 21:03:46 +01:00
Eike Kettner
613696539f
Minor refactorings
2020-12-15 21:03:46 +01:00
Eike Kettner
e3f6892abd
Convert job record
2020-12-15 21:03:46 +01:00
Eike Kettner
3cef932ccd
Convert more records
2020-12-15 21:03:46 +01:00
Eike Kettner
10b49fccf8
Converting user and userimap records
2020-12-15 21:03:46 +01:00
Eike Kettner
f5ae389eea
Cleanup remember-me tokens periodically
2020-12-04 17:59:25 +01:00
Eike Kettner
290989f67f
Reorder correspondent person suggestion based on org relationship
2020-12-01 23:39:45 +01:00
Eike Kettner
3fabe0a582
Update to Scala 2.13.4
2020-11-27 20:26:24 +01:00
Eike Kettner
5fe532001b
Allow to specify document lanugage with the request
2020-11-23 20:49:01 +01:00
Eike Kettner
5034e12bec
Add a subject filter to scan-mailbox args
2020-11-13 23:15:20 +01:00
mergify[bot]
e5ce1fd45f
Merge pull request #437 from eikek/upload-improvements
...
Upload improvements
2020-11-12 22:58:08 +00:00
Eike Kettner
4fd6e02ec0
Improve glob and filter archive entries
2020-11-11 21:01:23 +01:00
Eike Kettner
27eb5d70de
Apply given tags in processing step
...
Issue: #346
2020-11-11 21:01:23 +01:00
Eike Kettner
55a6f7aaf6
Add more properties to upload meta data
2020-11-11 21:01:23 +01:00
Eike Kettner
746e04c624
Improve logging when creating preview images
2020-11-10 22:25:46 +01:00
Eike Kettner
10305bc82d
Minor improvements
2020-11-09 21:16:53 +01:00
Eike Kettner
29455d638c
Add startup task to find page counts of existing files
2020-11-09 20:35:35 +01:00
Eike Kettner
a77f34b7ba
Add a processing step to retrieve page counts
2020-11-09 11:08:24 +01:00
Eike Kettner
f4e50c5229
Provide endpoints to submit tasks to re-generate previews
...
The scaling factor can be given in the config file. When this changes,
images can be regenerated via POSTing to certain endpoints. It is
possible to regenerate just one attachment preview or all within a
collective.
2020-11-09 09:00:02 +01:00
Eike Kettner
6037b54959
Don't fail processing if generating preview fails
2020-11-09 00:05:11 +01:00
Eike Kettner
709848244c
Create tasks to generate all previews
...
There is a task to generate preview images per attachment. It can
either add them (if not present yet) or overwrite them (e.g. some
config has changed).
There is a task that selects all attachments without previews and
submits a task to create it. This is submitted on start automatically
to generate previews for all existing attachments.
2020-11-08 23:46:02 +01:00
Eike Kettner
7ba6baf6f0
Make preview image smaller
2020-11-08 15:12:56 +01:00
Eike Kettner
6db5c39d78
Fix converted filename
...
Mark it by default with a string from the config file.
Issue: 397
2020-11-08 09:45:03 +01:00
Eike Kettner
ef7cb4e779
Create a preview image of all files during processing
2020-11-08 01:25:59 +01:00
Eike Kettner
ab1139523a
Let the convert-all task retry when pdf conversion fails
2020-10-26 23:39:26 +01:00
Eike Kettner
b59696a9d3
Make sure to only remove/retry items in premature states
2020-10-26 23:39:26 +01:00
Eike Kettner
26e89bf84e
Edit org/person/equipment of multiple items
2020-10-26 13:35:47 +01:00
Eike Kettner
2e6026b817
Edit dates of multiple items
2020-10-26 13:16:03 +01:00
Eike Kettner
3c0b86cb19
Fix regex patterns used for NER
...
Patterns are split on whitespace by the nlp library and then compiled,
so each "word" must be a valid regex.
Fixes : #356
2020-10-21 00:55:14 +02:00
Eike Kettner
3f697f51aa
Autoformat
2020-10-06 23:31:09 +02:00
Eike Kettner
d4354b8b49
Skip pdf conversion if a converted file exists
...
For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step.
2020-10-02 17:39:39 +02:00