Commit Graph

407 Commits

Author SHA1 Message Date
Eike Kettner
04ba14f802 Amend source form with tags and file-filter
Allow to define tags and a file filter per source.
2020-11-12 22:37:28 +01:00
Eike Kettner
10305bc82d Minor improvements 2020-11-09 21:16:53 +01:00
Eike Kettner
29455d638c Add startup task to find page counts of existing files 2020-11-09 20:35:35 +01:00
Eike Kettner
8c08bf233d Amend search results with attachment info
This uses again another query per item to retrieve some information
about each attachment already in the search results.
2020-11-09 14:24:28 +01:00
Eike Kettner
a77f34b7ba Add a processing step to retrieve page counts 2020-11-09 11:08:24 +01:00
Eike Kettner
d4bbb936b6 Count preview image sizes in insight data 2020-11-09 09:00:03 +01:00
Eike Kettner
f4e50c5229 Provide endpoints to submit tasks to re-generate previews
The scaling factor can be given in the config file. When this changes,
images can be regenerated via POSTing to certain endpoints. It is
possible to regenerate just one attachment preview or all within a
collective.
2020-11-09 09:00:02 +01:00
Eike Kettner
709848244c Create tasks to generate all previews
There is a task to generate preview images per attachment. It can
either add them (if not present yet) or overwrite them (e.g. some
config has changed).

There is a task that selects all attachments without previews and
submits a task to create it. This is submitted on start automatically
to generate previews for all existing attachments.
2020-11-08 23:46:02 +01:00
Eike Kettner
eede194352 Fix deleting preview files 2020-11-08 21:27:55 +01:00
Eike Kettner
757ad31165 Add a route to get the item preview
This is the first available preview of an attachment wrt position. If
all attachments have a preview image, the preview of the first
attachment is returned.
2020-11-08 15:12:56 +01:00
Eike Kettner
0841a33ae3 Add a table to hold the preview files 2020-11-08 01:25:38 +01:00
Eike Kettner
0461cfefe7 Fix sql error for mariadb <10.4
MariaDB below 10.4 doesn't support parentheses around selects for
`intersect` and `union`.

https://mariadb.com/kb/en/intersect/#parentheses

Fixes #404
2020-10-28 22:54:51 +01:00
Eike Kettner
b59696a9d3 Make sure to only remove/retry items in premature states 2020-10-26 23:39:26 +01:00
Eike Kettner
26e89bf84e Edit org/person/equipment of multiple items 2020-10-26 13:35:47 +01:00
Eike Kettner
2e6026b817 Edit dates of multiple items 2020-10-26 13:16:03 +01:00
Eike Kettner
d4043634ac Edit direction of multiple items 2020-10-26 12:48:15 +01:00
Eike Kettner
7ad37c8d26 Editing tags for multiple items 2020-10-26 11:54:04 +01:00
Eike Kettner
3e2d272746 Add unique constraint for equipment names
Fixes #370
2020-10-21 22:42:19 +02:00
Eike Kettner
3771587e55 Find duplicate tags without category 2020-10-19 00:30:41 +02:00
Eike Kettner
6a3386ce66 Fix sql comparison with optional values 2020-10-19 00:29:41 +02:00
Eike Kettner
80ddca9aa3 Add counter to joblog for correct log order
This is to distinguish log entries created at the same time.
2020-10-02 22:14:30 +02:00
Eike Kettner
d4354b8b49 Skip pdf conversion if a converted file exists
For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step.
2020-10-02 17:39:39 +02:00
Eike Kettner
b6f23b038a Fix finding attachments for retries
The attachments to process again must be searched in sources and
archives, too.
2020-10-02 17:39:34 +02:00
Eike Kettner
e26d7129e7 Add fix for mariadb text columns
The `text` data type can only store up to 64kb data. The `mediumtext`
up to 16M and `longtext` up to 4G.

Issue: #297
2020-10-02 16:50:51 +02:00
Eike Kettner
552cdac1d3 Apply flyway api changes 2020-09-28 15:12:10 +02:00
Eike Kettner
f6f63000be Prepend a duplicate check when uploading files 2020-09-23 23:37:00 +02:00
Eike Kettner
c658677032 Autoformat 2020-09-09 00:29:32 +02:00
Eike Kettner
eb11b33028 Fix mariadb changsets 2020-09-07 20:02:50 +02:00
Eike Kettner
76ccfb8a81 Only learn from confirmed items
Text classification should only learn from confirmed items. Log if
classification is disabled when processing an item.
2020-09-07 13:04:40 +02:00
Eike Kettner
cb1a9e0699 Use separate sql migration for h2 2020-09-07 13:04:29 +02:00
Eike Kettner
06879456a6 Change job priority on queue page 2020-09-05 18:50:58 +02:00
Eike Kettner
4309bd8dfd Some cleanup 2020-09-02 21:22:30 +02:00
Eike Kettner
316b490008 Implement learning a text classifier from collective data 2020-09-02 18:28:14 +02:00
Eike Kettner
68bb65572b Integrate learn-classifier task into the app 2020-09-02 18:28:14 +02:00
Eike Kettner
8c4f2e702b Add classifier settings 2020-09-02 18:28:14 +02:00
Eike Kettner
de5b33c40d Add updated column to some tables 2020-08-24 21:30:52 +02:00
Eike Kettner
96d2f948f2 Use collective's addressbook to configure regexner 2020-08-24 14:40:52 +02:00
Eike Kettner
3986487f11 Add api docs and cleanup 2020-08-13 21:22:54 +02:00
Eike Kettner
69674eb485 Improve job-queue query to make sure jobs across all states show up 2020-08-13 01:06:13 +02:00
Eike Kettner
41ea071555 Add a task to convert all pdfs that have not been converted 2020-08-13 01:06:13 +02:00
Eike Kettner
07e9a9767e Add a task to re-process files of an item 2020-08-12 22:29:56 +02:00
Eike Kettner
098e4cf868 Fix uploading to enabled/disabled source endpoints 2020-08-09 09:21:23 +02:00
Eike Kettner
06ad9ac46c Add routes to conveniently set/toggle tags 2020-08-08 15:08:04 +02:00
Eike Kettner
1c8b66194b Add a route to return used tags
This is part of the `/insights` route without queries for file usage.
2020-08-08 08:35:35 +02:00
Eike Kettner
a4796f3f7f Return more tag details with item insights 2020-08-08 00:41:20 +02:00
Eike Kettner
f3ba224124 Add missing organization/person/equipment routes 2020-08-07 01:30:43 +02:00
Eike Kettner
070c2b5e5f Allow to search by tag categories
The server accepts a list of tag categories for inclusion and
exclusion. The categories in the include list imply to return items
that have at least one tag of each category. The categories in the
exclude list imply to return all items that have no tag in any of
these categories.
2020-08-06 21:43:27 +02:00
Eike Kettner
09d74b7e80 Return item notes with search results
In order to not make the response very large, a admin can define a
limit on how much to return.
2020-08-05 00:09:37 +02:00
Eike Kettner
209c068436 Use keywords in pdfs to search for existing tags
During processing, keywords stored in PDF metadata are used to look
them up in the tag database and associate any existing tags to the
item.

See #175
2020-07-19 00:28:04 +02:00
Eike Kettner
3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.
2020-07-18 17:19:29 +02:00
Eike Kettner
c697501571 Add folders sql changeset for mariadb 2020-07-14 23:22:52 +02:00
Eike Kettner
5b01c93711 Add a folder-id to item processing
This allows to define a folder when uploading files. All generated
items are associated to this folder on creation.
2020-07-14 23:18:39 +02:00
Eike Kettner
ec7f027b4e Fix postgres changeset for folders 2020-07-12 16:15:02 +02:00
Eike Kettner
259526a088 Organize imports 2020-07-12 13:51:52 +02:00
Eike Kettner
22fa1dba13 Apply folder restriction to fulltext only search
And update index when folder changes.
2020-07-12 13:50:45 +02:00
Eike Kettner
e387b5513f Remove items in non-member folders from sql search results 2020-07-11 22:25:56 +02:00
Eike Kettner
5b95fddf3d Make item queries depend on the account-id
Now the user is required, too, to list items.
2020-07-11 21:54:51 +02:00
Eike Kettner
0df541f30a Allow to search by folders 2020-07-11 16:52:13 +02:00
Eike Kettner
86443e10a6 Set the folder of an item 2020-07-11 12:57:17 +02:00
Eike Kettner
2ab0b5e222 Rename space -> folder 2020-07-11 11:54:23 +02:00
Eike Kettner
60a08fc786 Return member count and if current user is owner or member 2020-07-11 01:30:29 +02:00
Eike Kettner
ea4ab11195 Allow to only return owning spaces 2020-07-11 01:30:28 +02:00
Eike Kettner
752a94a9e2 Implement space operations 2020-07-11 01:30:28 +02:00
Eike Kettner
7ec0fc2593 Add endpoints for managing spaces to openapi spec 2020-07-11 01:30:28 +02:00
Eike Kettner
13ad5e3219 Setup space entities 2020-07-11 01:30:28 +02:00
Eike Kettner
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
Eike Kettner
41c0f70d3b Fix cancelling jobs
A request to cancel a job was not processed correctly. The cancelling
routine of a task must run, regardless of the (non-final) state. Now
it works like this: if a job is currently running, it is interrupted
and its cancel routine is invoked. It then enters "cancelled" state.
If it is stuck, it is loaded and only its cancel routine is run. If it
is in a final state or waiting, it is removed from the queue.
2020-06-26 23:08:27 +02:00
Eike Kettner
23477e34f9 Change columns from timestamp to datetime
In MariaDB the timestamp has some properties that make it a not a good
fit.
2020-06-26 17:07:00 +02:00
Eike Kettner
64c96942a9 Fix deleting items that have sent mails 2020-06-24 23:47:58 +02:00
Eike Kettner
532caed84c Consistent logging of request/responses to solr
Using a middleware. Also add missing changesets for mariadb.
2020-06-24 21:25:46 +02:00
Eike Kettner
7df77208fe Fix duplicate search results 2020-06-24 01:15:53 +02:00
Eike Kettner
7d7460b1c9 Cleanup + hiding false errors from log 2020-06-24 00:23:22 +02:00
Eike Kettner
d5c9923a6d Add a route that only searches the full-text index
It returns the results in the same order as received from the index to
preserve the relevance ordering.
2020-06-24 00:03:17 +02:00
Eike Kettner
d9f0f05613 Refactor findItemsWithTags to more general useful 2020-06-23 21:27:01 +02:00
Eike Kettner
cfe5aa8894 Use no-op fts-client if disabled + push this flag to the webui 2020-06-21 21:06:08 +02:00
Eike Kettner
0d8b03fc61 Add backend operations for re-creating the full-text index 2020-06-21 15:46:51 +02:00
Eike Kettner
9acea8307d Update full-text index when changing data 2020-06-21 00:33:39 +02:00
Eike Kettner
1f4ff0d4c4 Add language to schema, extend fts-client 2020-06-20 22:44:47 +02:00
Eike Kettner
2a0bf24088 Setup solr schema and index all data using a system task
The task runs on application start. It sets the schema using solr's
schema api and then indexes all data in the database. Each step is
memorized so that it is not executed again on subsequent starts.
2020-06-19 21:37:22 +02:00
Eike Kettner
1f4220eccb Index exsiting data in solr 2020-06-19 00:43:35 +02:00
Eike Kettner
60c079f664 Add task to index current database state 2020-06-18 22:38:45 +02:00
Eike Kettner
522daaf57e Introducing fts client into codebase 2020-06-17 23:20:46 +02:00
Eike Kettner
4028b7979e Fix mariadb timestamp columns
MariaDB automatically inserts the current time, even when saying `SET
datecol = null`.
2020-06-17 21:51:30 +02:00
Eike Kettner
84a26461ed Add a route to update the name of an attachment 2020-06-14 17:03:07 +02:00
Eike Kettner
88234986e6 Make name field search in item name only
Now there is an `allNames` field that searches names of multiple
things.
2020-06-13 21:17:29 +02:00
Eike Kettner
67666595eb Make name search case insensitive 2020-06-13 21:17:15 +02:00
Eike Kettner
f30c8a5e4d Add new search term that searches in all meta data
A field that searches via substring search in names of correspondents
and concerned meta data.
2020-06-13 17:08:26 +02:00
Eike Kettner
1d2a6e6caa Add endpoint to search for items and return their tags
This is a more expensive query, since the tags must be resolved per
item. This is now implemented by doing additional queries while
caching each resolved tag.
2020-06-07 15:18:28 +02:00
Eike Kettner
6abdb95f02 Reformatting 2020-06-06 20:52:23 +02:00
Eike Kettner
071ab60a5c Remove i_date query binding 2020-06-06 15:15:29 +02:00
Eike Kettner
d5819eab35 Fix offset/limit clause for mariadb
MariaDB wants first limit and then offset (optionally), postgres
doesn't care.
2020-06-06 11:13:33 +02:00
Eike Kettner
e5b90eff34 Allow client to load items in batches 2020-06-06 11:05:15 +02:00
Eike Kettner
4b0eb650f2 Rename package to avoid name clashes 2020-05-25 16:22:09 +02:00
Eike Kettner
56624515a5 ScalafmtAll 2020-05-25 13:56:06 +02:00
Eike Kettner
ee394eae86 Try streamline the different impls for MimeType 2020-05-25 09:24:24 +02:00
Eike Kettner
3cb738568f Allow to change position of attachments 2020-05-24 17:30:25 +02:00
Eike Kettner
4694433e38 Fix attachment positions
It worked for new items, because the implicit offset was 0. when
adding archives to existing items, there are already attachments and
the new attachments are added to the end. This won't work if files are
added concurrently, because there is no quick and reliable way to
determine the offset then.
2020-05-24 15:13:30 +02:00
Eike Kettner
1dde43e092 Only process attachments in task arguments
When files are added to an item, the attachments already present must
not be "re-processed".
2020-05-24 13:29:38 +02:00
Eike Kettner
f4949446e3 Allow to specify an item id to amend files to existing items 2020-05-23 20:15:55 +02:00
Eike Kettner
25d089da6c Update state and proposals only on invalid items
Invalid items are those that are not ready, and not shown to the user.
When changing metadata, it should only be changed, if the item was not
already shown to the user.
2020-05-23 15:46:24 +02:00
Eike Kettner
f16632bc7f Allow a collective to disable the integration endpoint 2020-05-23 14:29:24 +02:00
Eike Kettner
9f9dd6c0fb Change routes for scan-mailbox task to allow multiple tasks per user 2020-05-21 22:04:45 +02:00
Eike Kettner
f2d67dc816 Initial impl of import from mailbox user task 2020-05-20 17:52:38 +02:00
Eike Kettner
c9de74fd91 Add imap settings 2020-05-18 08:46:04 +02:00
Eike Kettner
0a5501dcb0 Fix findFileByChecksum 2020-05-10 21:03:12 +02:00
Eike Kettner
bd5066740d Joex depends on backend module
The job executor depends on backend module, since it may control the
application via user tasks. The `ONode` can now be moved from the
store module into the backend module.
2020-05-10 21:03:12 +02:00
Eike Kettner
c41cdeefec Update scalafmt to 2.5.1 + scalafmtAll 2020-05-04 23:53:57 +02:00
Eike Kettner
96c5e99f19 Fix scaladoc tag
There is no scaladoc tag @implNote.
2020-04-30 22:04:29 +02:00
Eike Kettner
a939839041 Delete single attachments 2020-04-26 23:11:49 +02:00
Eike Kettner
ffc1cdee51 Sort due items by their earliest due date 2020-04-22 22:21:28 +02:00
Eike Kettner
bbfd694b45 Allow to start a user task once 2020-04-22 21:08:45 +02:00
Eike Kettner
2723d6b43b Implement notify-due-items task 2020-04-22 21:08:45 +02:00
Eike Kettner
3a90d874a5 Improve form 2020-04-22 21:08:45 +02:00
Eike Kettner
ad772c0c25 Server-side stub impl for notify-due-items 2020-04-22 21:08:45 +02:00
Eike Kettner
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
Eike Kettner
09ea724c13 Store message-id of eml files 2020-03-25 22:00:51 +01:00
Eike Kettner
43efb4e6ba Use doobie support from emil project 2020-03-24 23:40:29 +01:00
Eike Kettner
7e6eec9533 Include archive infos in item detail 2020-03-22 21:35:50 +01:00
Eike Kettner
3703dce9a6 Update fs2 to 2.3.0 2020-03-20 22:47:09 +01:00
Eike Kettner
74a6cf1dd1 Remove unused migration directory 2020-03-19 22:43:41 +01:00
Eike Kettner
b1a1a2b837 Add archives to collective insights 2020-03-19 22:43:18 +01:00
Eike Kettner
439aaee27b Search archives when looking for files via checksum 2020-03-19 22:42:48 +01:00
Eike Kettner
4ed7a137f7 Add support for archive files
Each attachment is now first extracted into potentially multiple ones,
if it is recognized as an archive. This is the first step in
processing. The original archive file is also stored and the resulting
attachments are associated to their original archive.

First support is implemented for zip files.
2020-03-19 22:42:27 +01:00
Eike Kettner
00ca6b5697 Improve text analysis
- Search for consecutive labels

- Sort list of candidates by a weight

- Search for organizations using person labels
2020-03-17 22:34:50 +01:00
Eike Kettner
718e44a21c Add cleanup jobs task 2020-03-09 20:24:00 +01:00
Eike Kettner
854a596da3 Integrate periodic tasks
The first use case for periodic task is the cleanup of expired
invitation keys. This is part of a house-keeping periodic task.
2020-03-08 22:49:49 +01:00
Eike Kettner
616c333fa5 Implement storage routines for periodic scheduler 2020-03-08 13:56:23 +01:00
Eike Kettner
1e598bd902 Sketch a scheduler for running periodic tasks
Periodic tasks are special in that they are usually kept around and
started based on a schedule. A new component checks periodic tasks and
submits them in the queue once they are due.

In order to avoid duplicate periodic jobs, the tracker of a job is
used to store the periodic job id. Each time a periodic task is due,
it is first checked if there is a job running (or queued) for this
task.
2020-03-08 12:55:03 +01:00
Eike Kettner
42c59179b8 Fix search by checksum to include source files 2020-03-02 20:56:32 +01:00
Eike Kettner
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
Eike Kettner
cc16b0c024 Fix query to also work with mariadb 2020-02-24 13:34:54 +01:00
Eike Kettner
661cc3e65f Fix deleting attachments (again) 2020-02-23 20:18:13 +01:00
Eike Kettner
d937e0501a Add source files to collective insights 2020-02-23 20:17:53 +01:00
Eike Kettner
957073fe62 Return info about original files in item detail
This adds data to the current rest api.
2020-02-23 14:25:32 +01:00
Eike Kettner
74a037887d Fix deleting items and attachments to also remove the binary files 2020-02-22 00:54:55 +01:00
Eike Kettner
72fd3b1a25 Implement downloading original file 2020-02-20 22:33:57 +01:00
Eike Kettner
97305d27ff Integrate support for more files into processing and upload
The restriction that only pdf files can be uploaded is removed. All
files can now be uploaded. The processing may not process all. It is
still possible to restrict file uploads by types via a configuration.
2020-02-19 23:27:00 +01:00
Eike Kettner
ba3865ef5e Starting to support more file types
First, files are be converted to PDF for archiving. It is also easier
to create a preview. This is done via the `ConvertPdf` processing
task (which is not yet implemented).

Text extraction then tries first with the original file. If that
fails, OCR is done on the (potentially) converted pdf file.

To not loose information of the original file, it is saved using the
table `attachment_source`. If the original file is already a pdf, or
the conversion did not succeed, the `attachment` and
`attachment_source` record point to the same file.
2020-02-10 12:42:45 +01:00
Eike Kettner
5c37efeaba Apply scalafmt to all files 2020-02-09 01:54:26 +01:00
Eike Kettner
9b66604b96 Include item notes in search 2020-02-08 13:39:06 +01:00
Eike Kettner
6d0c140e8e Add mariadb database migration 2020-01-12 01:17:49 +01:00
Eike Kettner
d535130c9e Provide email proposals from address book 2020-01-12 01:04:42 +01:00
Eike Kettner
2ecfb679d9 Add routes to retrieve sent mails 2020-01-11 12:58:04 +01:00
Eike Kettner
b795a22992 Send mails for items 2020-01-10 00:45:29 +01:00
Eike Kettner
2d69d39dd1 Connect multiple items to a mail 2020-01-09 18:20:59 +01:00
Eike Kettner
7a3289c41d Prepare sending mail 2020-01-08 22:44:34 +01:00
Eike Kettner
32050a9faf Finish mail settings 2020-01-07 00:20:28 +01:00
Eike Kettner
f235f3a030 Starting with mail functionality 2020-01-05 23:23:28 +01:00
Eike Kettner
2e3454c7a1 Starting with mail settings 2020-01-05 15:31:32 +01:00
Eike Kettner
8814de3c38 Allow simple search when listing meta data 2020-01-02 20:21:49 +01:00
Eike Kettner
eb6c483ef0 Add route to check for files by their checksum
Adopt scripts in `tools/` to check for existing files using these
routes.
2019-12-31 23:45:02 +01:00
Eike Kettner
d05e919eb4 Update doobie, use legacy java.time conversions 2019-12-31 13:55:09 +01:00
Eike Kettner
fc3e22e399 Apply scalafmt to all files 2019-12-30 21:44:13 +01:00
Eike Kettner
a9e70401de Update dependencies 2019-12-28 12:38:11 +01:00
Eike Kettner
2ad1586d00 Set stricter compile options and fix cookie data 2019-09-28 22:17:45 +02:00
Eike Kettner
831cd8b655 Initial version.
Features:

- Upload PDF files let them analyze

- Manage meta data and items

- See processing in webapp
2019-09-21 22:02:36 +02:00
Eike Kettner
6154e6a387 Initial application stub 2019-09-21 14:54:03 +02:00