docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-06-22 18:38:26 +00:00

Author	SHA1	Message	Date
Eike Kettner	04ba14f802	Amend source form with tags and file-filter Allow to define tags and a file filter per source.	2020-11-12 22:37:28 +01:00
Eike Kettner	10305bc82d	Minor improvements	2020-11-09 21:16:53 +01:00
Eike Kettner	29455d638c	Add startup task to find page counts of existing files	2020-11-09 20:35:35 +01:00
Eike Kettner	8c08bf233d	Amend search results with attachment info This uses again another query per item to retrieve some information about each attachment already in the search results.	2020-11-09 14:24:28 +01:00
Eike Kettner	a77f34b7ba	Add a processing step to retrieve page counts	2020-11-09 11:08:24 +01:00
Eike Kettner	d4bbb936b6	Count preview image sizes in insight data	2020-11-09 09:00:03 +01:00
Eike Kettner	f4e50c5229	Provide endpoints to submit tasks to re-generate previews The scaling factor can be given in the config file. When this changes, images can be regenerated via POSTing to certain endpoints. It is possible to regenerate just one attachment preview or all within a collective.	2020-11-09 09:00:02 +01:00
Eike Kettner	709848244c	Create tasks to generate all previews There is a task to generate preview images per attachment. It can either add them (if not present yet) or overwrite them (e.g. some config has changed). There is a task that selects all attachments without previews and submits a task to create it. This is submitted on start automatically to generate previews for all existing attachments.	2020-11-08 23:46:02 +01:00
Eike Kettner	eede194352	Fix deleting preview files	2020-11-08 21:27:55 +01:00
Eike Kettner	757ad31165	Add a route to get the item preview This is the first available preview of an attachment wrt position. If all attachments have a preview image, the preview of the first attachment is returned.	2020-11-08 15:12:56 +01:00
Eike Kettner	0841a33ae3	Add a table to hold the preview files	2020-11-08 01:25:38 +01:00
Eike Kettner	0461cfefe7	Fix sql error for mariadb <10.4 MariaDB below 10.4 doesn't support parentheses around selects for `intersect` and `union`. https://mariadb.com/kb/en/intersect/#parentheses Fixes #404	2020-10-28 22:54:51 +01:00
Eike Kettner	b59696a9d3	Make sure to only remove/retry items in premature states	2020-10-26 23:39:26 +01:00
Eike Kettner	26e89bf84e	Edit org/person/equipment of multiple items	2020-10-26 13:35:47 +01:00
Eike Kettner	2e6026b817	Edit dates of multiple items	2020-10-26 13:16:03 +01:00
Eike Kettner	d4043634ac	Edit direction of multiple items	2020-10-26 12:48:15 +01:00
Eike Kettner	7ad37c8d26	Editing tags for multiple items	2020-10-26 11:54:04 +01:00
Eike Kettner	3e2d272746	Add unique constraint for equipment names Fixes #370	2020-10-21 22:42:19 +02:00
Eike Kettner	3771587e55	Find duplicate tags without category	2020-10-19 00:30:41 +02:00
Eike Kettner	6a3386ce66	Fix sql comparison with optional values	2020-10-19 00:29:41 +02:00
Eike Kettner	80ddca9aa3	Add counter to joblog for correct log order This is to distinguish log entries created at the same time.	2020-10-02 22:14:30 +02:00
Eike Kettner	d4354b8b49	Skip pdf conversion if a converted file exists For images the conversion also returns the extracted text. If this would have failed to be saved, it is extracted in the following text-extraction step.	2020-10-02 17:39:39 +02:00
Eike Kettner	b6f23b038a	Fix finding attachments for retries The attachments to process again must be searched in sources and archives, too.	2020-10-02 17:39:34 +02:00
Eike Kettner	e26d7129e7	Add fix for mariadb text columns The `text` data type can only store up to 64kb data. The `mediumtext` up to 16M and `longtext` up to 4G. Issue: #297	2020-10-02 16:50:51 +02:00
Eike Kettner	552cdac1d3	Apply flyway api changes	2020-09-28 15:12:10 +02:00
Eike Kettner	f6f63000be	Prepend a duplicate check when uploading files	2020-09-23 23:37:00 +02:00
Eike Kettner	c658677032	Autoformat	2020-09-09 00:29:32 +02:00
Eike Kettner	eb11b33028	Fix mariadb changsets	2020-09-07 20:02:50 +02:00
Eike Kettner	76ccfb8a81	Only learn from confirmed items Text classification should only learn from confirmed items. Log if classification is disabled when processing an item.	2020-09-07 13:04:40 +02:00
Eike Kettner	cb1a9e0699	Use separate sql migration for h2	2020-09-07 13:04:29 +02:00
Eike Kettner	06879456a6	Change job priority on queue page	2020-09-05 18:50:58 +02:00
Eike Kettner	4309bd8dfd	Some cleanup	2020-09-02 21:22:30 +02:00
Eike Kettner	316b490008	Implement learning a text classifier from collective data	2020-09-02 18:28:14 +02:00
Eike Kettner	68bb65572b	Integrate learn-classifier task into the app	2020-09-02 18:28:14 +02:00
Eike Kettner	8c4f2e702b	Add classifier settings	2020-09-02 18:28:14 +02:00
Eike Kettner	de5b33c40d	Add `updated` column to some tables	2020-08-24 21:30:52 +02:00
Eike Kettner	96d2f948f2	Use collective's addressbook to configure regexner	2020-08-24 14:40:52 +02:00
Eike Kettner	3986487f11	Add api docs and cleanup	2020-08-13 21:22:54 +02:00
Eike Kettner	69674eb485	Improve job-queue query to make sure jobs across all states show up	2020-08-13 01:06:13 +02:00
Eike Kettner	41ea071555	Add a task to convert all pdfs that have not been converted	2020-08-13 01:06:13 +02:00
Eike Kettner	07e9a9767e	Add a task to re-process files of an item	2020-08-12 22:29:56 +02:00
Eike Kettner	098e4cf868	Fix uploading to enabled/disabled source endpoints	2020-08-09 09:21:23 +02:00
Eike Kettner	06ad9ac46c	Add routes to conveniently set/toggle tags	2020-08-08 15:08:04 +02:00
Eike Kettner	1c8b66194b	Add a route to return used tags This is part of the `/insights` route without queries for file usage.	2020-08-08 08:35:35 +02:00
Eike Kettner	a4796f3f7f	Return more tag details with item insights	2020-08-08 00:41:20 +02:00
Eike Kettner	f3ba224124	Add missing organization/person/equipment routes	2020-08-07 01:30:43 +02:00
Eike Kettner	070c2b5e5f	Allow to search by tag categories The server accepts a list of tag categories for inclusion and exclusion. The categories in the include list imply to return items that have at least one tag of each category. The categories in the exclude list imply to return all items that have no tag in any of these categories.	2020-08-06 21:43:27 +02:00
Eike Kettner	09d74b7e80	Return item notes with search results In order to not make the response very large, a admin can define a limit on how much to return.	2020-08-05 00:09:37 +02:00
Eike Kettner	209c068436	Use keywords in pdfs to search for existing tags During processing, keywords stored in PDF metadata are used to look them up in the tag database and associate any existing tags to the item. See #175	2020-07-19 00:28:04 +02:00
Eike Kettner	3d49ceaab5	Use ocrmypdf tool to create pdf/a during conversion - Use another external tool to convert pdf to pdf which also adds the extracted text as another layer into the pdf - Although not used, the external conversion routine will now check for an existing text file that is named as the pdf file with extension `.txt`. If present it is included in the conversion result and will be used as the extracted text. - text extraction for pdf files happens now on the converted file, because it may already contain the text from the conversion step and thus avoids running OCR twice. - All errors during conversion are not fatal; processing continues without a converted file.	2020-07-18 17:19:29 +02:00

... 3 4 5 6 7 ...

357 Commits