Commit Graph

124 Commits

Author SHA1 Message Date
6b1156182c Add support for eml (rfc822 email) files 2020-03-19 22:42:40 +01:00
4ed7a137f7 Add support for archive files
Each attachment is now first extracted into potentially multiple ones,
if it is recognized as an archive. This is the first step in
processing. The original archive file is also stored and the resulting
attachments are associated to their original archive.

First support is implemented for zip files.
2020-03-19 22:42:27 +01:00
f0449dd2ce Properly initialize thread pools 2020-03-17 22:37:12 +01:00
00ca6b5697 Improve text analysis
- Search for consecutive labels

- Sort list of candidates by a weight

- Search for organizations using person labels
2020-03-17 22:34:50 +01:00
718e44a21c Add cleanup jobs task 2020-03-09 20:24:00 +01:00
854a596da3 Integrate periodic tasks
The first use case for periodic task is the cleanup of expired
invitation keys. This is part of a house-keeping periodic task.
2020-03-08 22:49:49 +01:00
616c333fa5 Implement storage routines for periodic scheduler 2020-03-08 13:56:23 +01:00
1e598bd902 Sketch a scheduler for running periodic tasks
Periodic tasks are special in that they are usually kept around and
started based on a schedule. A new component checks periodic tasks and
submits them in the queue once they are due.

In order to avoid duplicate periodic jobs, the tracker of a job is
used to store the periodic job id. Each time a periodic task is due,
it is first checked if there is a job running (or queued) for this
task.
2020-03-08 12:55:03 +01:00
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
ec419c7bfd Adopt nix modules to new config 2020-02-22 12:40:56 +01:00
3f316ab4d0 Update config file doc 2020-02-20 21:10:00 +01:00
97305d27ff Integrate support for more files into processing and upload
The restriction that only pdf files can be uploaded is removed. All
files can now be uploaded. The processing may not process all. It is
still possible to restrict file uploads by types via a configuration.
2020-02-19 23:27:00 +01:00
0dcc00836b Make logger configurable in system commands 2020-02-18 12:02:43 +01:00
bd605b8c94 Add first drafts for converting 2020-02-18 01:31:22 +01:00
e0682464b5 Configure pdf extraction; move Logger and DataType to common 2020-02-17 14:01:36 +01:00
3d615181e0 Early draft for text extraction 2020-02-17 01:57:22 +01:00
851ee7ef0f Reorganize processing code
Use separate modules for

- text extraction
- conversion to pdf
- text analysis
2020-02-15 21:25:25 +01:00
ce22b727b1 Add new convert module and sketch its integration 2020-02-11 00:33:52 +01:00
3be90d64d5 Move SystemCommand to common module 2020-02-10 22:23:06 +01:00
ba3865ef5e Starting to support more file types
First, files are be converted to PDF for archiving. It is also easier
to create a preview. This is done via the `ConvertPdf` processing
task (which is not yet implemented).

Text extraction then tries first with the original file. If that
fails, OCR is done on the (potentially) converted pdf file.

To not loose information of the original file, it is saved using the
table `attachment_source`. If the original file is already a pdf, or
the conversion did not succeed, the `attachment` and
`attachment_source` record point to the same file.
2020-02-10 12:42:45 +01:00
fc3e22e399 Apply scalafmt to all files 2019-12-30 21:44:13 +01:00
2ad1586d00 Set stricter compile options and fix cookie data 2019-09-28 22:17:45 +02:00
831cd8b655 Initial version.
Features:

- Upload PDF files let them analyze

- Manage meta data and items

- See processing in webapp
2019-09-21 22:02:36 +02:00
6154e6a387 Initial application stub 2019-09-21 14:54:03 +02:00