Eike Kettner
10305bc82d
Minor improvements
2020-11-09 21:16:53 +01:00
Eike Kettner
29455d638c
Add startup task to find page counts of existing files
2020-11-09 20:35:35 +01:00
Eike Kettner
8c08bf233d
Amend search results with attachment info
...
This uses again another query per item to retrieve some information
about each attachment already in the search results.
2020-11-09 14:24:28 +01:00
Eike Kettner
a77f34b7ba
Add a processing step to retrieve page counts
2020-11-09 11:08:24 +01:00
Eike Kettner
d4bbb936b6
Count preview image sizes in insight data
2020-11-09 09:00:03 +01:00
Eike Kettner
f4e50c5229
Provide endpoints to submit tasks to re-generate previews
...
The scaling factor can be given in the config file. When this changes,
images can be regenerated via POSTing to certain endpoints. It is
possible to regenerate just one attachment preview or all within a
collective.
2020-11-09 09:00:02 +01:00
Eike Kettner
709848244c
Create tasks to generate all previews
...
There is a task to generate preview images per attachment. It can
either add them (if not present yet) or overwrite them (e.g. some
config has changed).
There is a task that selects all attachments without previews and
submits a task to create it. This is submitted on start automatically
to generate previews for all existing attachments.
2020-11-08 23:46:02 +01:00
Eike Kettner
eede194352
Fix deleting preview files
2020-11-08 21:27:55 +01:00
Eike Kettner
757ad31165
Add a route to get the item preview
...
This is the first available preview of an attachment wrt position. If
all attachments have a preview image, the preview of the first
attachment is returned.
2020-11-08 15:12:56 +01:00
Eike Kettner
0841a33ae3
Add a table to hold the preview files
2020-11-08 01:25:38 +01:00
Eike Kettner
0461cfefe7
Fix sql error for mariadb <10.4
...
MariaDB below 10.4 doesn't support parentheses around selects for
`intersect` and `union`.
https://mariadb.com/kb/en/intersect/#parentheses
Fixes #404
2020-10-28 22:54:51 +01:00
Eike Kettner
b59696a9d3
Make sure to only remove/retry items in premature states
2020-10-26 23:39:26 +01:00
Eike Kettner
26e89bf84e
Edit org/person/equipment of multiple items
2020-10-26 13:35:47 +01:00
Eike Kettner
2e6026b817
Edit dates of multiple items
2020-10-26 13:16:03 +01:00
Eike Kettner
d4043634ac
Edit direction of multiple items
2020-10-26 12:48:15 +01:00
Eike Kettner
7ad37c8d26
Editing tags for multiple items
2020-10-26 11:54:04 +01:00
Eike Kettner
3e2d272746
Add unique constraint for equipment names
...
Fixes #370
2020-10-21 22:42:19 +02:00
Eike Kettner
3771587e55
Find duplicate tags without category
2020-10-19 00:30:41 +02:00
Eike Kettner
6a3386ce66
Fix sql comparison with optional values
2020-10-19 00:29:41 +02:00
Eike Kettner
80ddca9aa3
Add counter to joblog for correct log order
...
This is to distinguish log entries created at the same time.
2020-10-02 22:14:30 +02:00
Eike Kettner
d4354b8b49
Skip pdf conversion if a converted file exists
...
For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step.
2020-10-02 17:39:39 +02:00
Eike Kettner
b6f23b038a
Fix finding attachments for retries
...
The attachments to process again must be searched in sources and
archives, too.
2020-10-02 17:39:34 +02:00
Eike Kettner
e26d7129e7
Add fix for mariadb text columns
...
The `text` data type can only store up to 64kb data. The `mediumtext`
up to 16M and `longtext` up to 4G.
Issue: #297
2020-10-02 16:50:51 +02:00
Eike Kettner
552cdac1d3
Apply flyway api changes
2020-09-28 15:12:10 +02:00
Eike Kettner
f6f63000be
Prepend a duplicate check when uploading files
2020-09-23 23:37:00 +02:00
Eike Kettner
c658677032
Autoformat
2020-09-09 00:29:32 +02:00
Eike Kettner
eb11b33028
Fix mariadb changsets
2020-09-07 20:02:50 +02:00
Eike Kettner
76ccfb8a81
Only learn from confirmed items
...
Text classification should only learn from confirmed items. Log if
classification is disabled when processing an item.
2020-09-07 13:04:40 +02:00
Eike Kettner
cb1a9e0699
Use separate sql migration for h2
2020-09-07 13:04:29 +02:00
Eike Kettner
06879456a6
Change job priority on queue page
2020-09-05 18:50:58 +02:00
Eike Kettner
4309bd8dfd
Some cleanup
2020-09-02 21:22:30 +02:00
Eike Kettner
316b490008
Implement learning a text classifier from collective data
2020-09-02 18:28:14 +02:00
Eike Kettner
68bb65572b
Integrate learn-classifier task into the app
2020-09-02 18:28:14 +02:00
Eike Kettner
8c4f2e702b
Add classifier settings
2020-09-02 18:28:14 +02:00
Eike Kettner
de5b33c40d
Add updated
column to some tables
2020-08-24 21:30:52 +02:00
Eike Kettner
96d2f948f2
Use collective's addressbook to configure regexner
2020-08-24 14:40:52 +02:00
Eike Kettner
3986487f11
Add api docs and cleanup
2020-08-13 21:22:54 +02:00
Eike Kettner
69674eb485
Improve job-queue query to make sure jobs across all states show up
2020-08-13 01:06:13 +02:00
Eike Kettner
41ea071555
Add a task to convert all pdfs that have not been converted
2020-08-13 01:06:13 +02:00
Eike Kettner
07e9a9767e
Add a task to re-process files of an item
2020-08-12 22:29:56 +02:00
Eike Kettner
098e4cf868
Fix uploading to enabled/disabled source endpoints
2020-08-09 09:21:23 +02:00
Eike Kettner
06ad9ac46c
Add routes to conveniently set/toggle tags
2020-08-08 15:08:04 +02:00
Eike Kettner
1c8b66194b
Add a route to return used tags
...
This is part of the `/insights` route without queries for file usage.
2020-08-08 08:35:35 +02:00
Eike Kettner
a4796f3f7f
Return more tag details with item insights
2020-08-08 00:41:20 +02:00
Eike Kettner
f3ba224124
Add missing organization/person/equipment routes
2020-08-07 01:30:43 +02:00
Eike Kettner
070c2b5e5f
Allow to search by tag categories
...
The server accepts a list of tag categories for inclusion and
exclusion. The categories in the include list imply to return items
that have at least one tag of each category. The categories in the
exclude list imply to return all items that have no tag in any of
these categories.
2020-08-06 21:43:27 +02:00
Eike Kettner
09d74b7e80
Return item notes with search results
...
In order to not make the response very large, a admin can define a
limit on how much to return.
2020-08-05 00:09:37 +02:00
Eike Kettner
209c068436
Use keywords in pdfs to search for existing tags
...
During processing, keywords stored in PDF metadata are used to look
them up in the tag database and associate any existing tags to the
item.
See #175
2020-07-19 00:28:04 +02:00
Eike Kettner
3d49ceaab5
Use ocrmypdf tool to create pdf/a during conversion
...
- Use another external tool to convert pdf to pdf which also adds the
extracted text as another layer into the pdf
- Although not used, the external conversion routine will now check
for an existing text file that is named as the pdf file with extension
`.txt`. If present it is included in the conversion result and will be
used as the extracted text.
- text extraction for pdf files happens now on the converted file,
because it may already contain the text from the conversion step and
thus avoids running OCR twice.
- All errors during conversion are not fatal; processing continues
without a converted file.
2020-07-18 17:19:29 +02:00
Eike Kettner
c697501571
Add folders sql changeset for mariadb
2020-07-14 23:22:52 +02:00