Commit Graph

339 Commits

Author SHA1 Message Date
3771587e55 Find duplicate tags without category 2020-10-19 00:30:41 +02:00
6a3386ce66 Fix sql comparison with optional values 2020-10-19 00:29:41 +02:00
80ddca9aa3 Add counter to joblog for correct log order
This is to distinguish log entries created at the same time.
2020-10-02 22:14:30 +02:00
d4354b8b49 Skip pdf conversion if a converted file exists
For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step.
2020-10-02 17:39:39 +02:00
b6f23b038a Fix finding attachments for retries
The attachments to process again must be searched in sources and
archives, too.
2020-10-02 17:39:34 +02:00
e26d7129e7 Add fix for mariadb text columns
The `text` data type can only store up to 64kb data. The `mediumtext`
up to 16M and `longtext` up to 4G.

Issue: #297
2020-10-02 16:50:51 +02:00
552cdac1d3 Apply flyway api changes 2020-09-28 15:12:10 +02:00
f6f63000be Prepend a duplicate check when uploading files 2020-09-23 23:37:00 +02:00
c658677032 Autoformat 2020-09-09 00:29:32 +02:00
eb11b33028 Fix mariadb changsets 2020-09-07 20:02:50 +02:00
76ccfb8a81 Only learn from confirmed items
Text classification should only learn from confirmed items. Log if
classification is disabled when processing an item.
2020-09-07 13:04:40 +02:00
cb1a9e0699 Use separate sql migration for h2 2020-09-07 13:04:29 +02:00
06879456a6 Change job priority on queue page 2020-09-05 18:50:58 +02:00
4309bd8dfd Some cleanup 2020-09-02 21:22:30 +02:00
316b490008 Implement learning a text classifier from collective data 2020-09-02 18:28:14 +02:00
68bb65572b Integrate learn-classifier task into the app 2020-09-02 18:28:14 +02:00
8c4f2e702b Add classifier settings 2020-09-02 18:28:14 +02:00
de5b33c40d Add updated column to some tables 2020-08-24 21:30:52 +02:00
96d2f948f2 Use collective's addressbook to configure regexner 2020-08-24 14:40:52 +02:00
3986487f11 Add api docs and cleanup 2020-08-13 21:22:54 +02:00
69674eb485 Improve job-queue query to make sure jobs across all states show up 2020-08-13 01:06:13 +02:00
41ea071555 Add a task to convert all pdfs that have not been converted 2020-08-13 01:06:13 +02:00
07e9a9767e Add a task to re-process files of an item 2020-08-12 22:29:56 +02:00
098e4cf868 Fix uploading to enabled/disabled source endpoints 2020-08-09 09:21:23 +02:00
06ad9ac46c Add routes to conveniently set/toggle tags 2020-08-08 15:08:04 +02:00
1c8b66194b Add a route to return used tags
This is part of the `/insights` route without queries for file usage.
2020-08-08 08:35:35 +02:00
a4796f3f7f Return more tag details with item insights 2020-08-08 00:41:20 +02:00
f3ba224124 Add missing organization/person/equipment routes 2020-08-07 01:30:43 +02:00
070c2b5e5f Allow to search by tag categories
The server accepts a list of tag categories for inclusion and
exclusion. The categories in the include list imply to return items
that have at least one tag of each category. The categories in the
exclude list imply to return all items that have no tag in any of
these categories.
2020-08-06 21:43:27 +02:00
09d74b7e80 Return item notes with search results
In order to not make the response very large, a admin can define a
limit on how much to return.
2020-08-05 00:09:37 +02:00
209c068436 Use keywords in pdfs to search for existing tags
During processing, keywords stored in PDF metadata are used to look
them up in the tag database and associate any existing tags to the
item.

See #175
2020-07-19 00:28:04 +02:00
3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.
2020-07-18 17:19:29 +02:00
c697501571 Add folders sql changeset for mariadb 2020-07-14 23:22:52 +02:00
5b01c93711 Add a folder-id to item processing
This allows to define a folder when uploading files. All generated
items are associated to this folder on creation.
2020-07-14 23:18:39 +02:00
ec7f027b4e Fix postgres changeset for folders 2020-07-12 16:15:02 +02:00
259526a088 Organize imports 2020-07-12 13:51:52 +02:00
22fa1dba13 Apply folder restriction to fulltext only search
And update index when folder changes.
2020-07-12 13:50:45 +02:00
e387b5513f Remove items in non-member folders from sql search results 2020-07-11 22:25:56 +02:00
5b95fddf3d Make item queries depend on the account-id
Now the user is required, too, to list items.
2020-07-11 21:54:51 +02:00
0df541f30a Allow to search by folders 2020-07-11 16:52:13 +02:00
86443e10a6 Set the folder of an item 2020-07-11 12:57:17 +02:00
2ab0b5e222 Rename space -> folder 2020-07-11 11:54:23 +02:00
60a08fc786 Return member count and if current user is owner or member 2020-07-11 01:30:29 +02:00
ea4ab11195 Allow to only return owning spaces 2020-07-11 01:30:28 +02:00
752a94a9e2 Implement space operations 2020-07-11 01:30:28 +02:00
7ec0fc2593 Add endpoints for managing spaces to openapi spec 2020-07-11 01:30:28 +02:00
13ad5e3219 Setup space entities 2020-07-11 01:30:28 +02:00
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
41c0f70d3b Fix cancelling jobs
A request to cancel a job was not processed correctly. The cancelling
routine of a task must run, regardless of the (non-final) state. Now
it works like this: if a job is currently running, it is interrupted
and its cancel routine is invoked. It then enters "cancelled" state.
If it is stuck, it is loaded and only its cancel routine is run. If it
is in a final state or waiting, it is removed from the queue.
2020-06-26 23:08:27 +02:00
23477e34f9 Change columns from timestamp to datetime
In MariaDB the timestamp has some properties that make it a not a good
fit.
2020-06-26 17:07:00 +02:00