docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-02-15 20:33:26 +00:00

Author	SHA1	Message	Date
mergify[bot]	8dd1672c8c	Merge pull request #583 from eikek/fix-baseurl-setting Render baseurl without trailing slash	2021-01-21 23:44:14 +00:00
Eike Kettner	0ec620fcf0	Render baseurl without trailing slash The webapp expects it like this currently, because the url is only a string.	2021-01-21 21:42:08 +01:00
Eike Kettner	4cba96f390	Always return classifier results as suggestion The classifier results are spliced into the suggestion list at second place. When linking they are only used if nlp didn't find anything.	2021-01-21 21:05:28 +01:00
Eike Kettner	5c487ef7a9	Refactor running classifier in text analysis	2021-01-19 21:30:02 +01:00
Eike Kettner	a6f29153c4	Control what tag categories to use for auto-tagging	2021-01-19 01:20:13 +01:00
Eike Kettner	3f75af0807	Add 9 more lanugages to the list of document lanugages	2021-01-18 17:41:40 +01:00
Eike Kettner	26dff18ae0	Add spanish as an example Adding a new language without nlp requires now only to fill out the pieces: - define a list of month names to support date recognition - add it to joex' dockerfile to be available for tesseract - update the solr migration/field definitions - update the elm file so it shows up on the client	2021-01-18 17:41:40 +01:00
Eike Kettner	f01646aeb5	Reorganize nlp pipeline and add nlp-unsupported language italian Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.	2021-01-18 17:41:40 +01:00
Eike Kettner	aa937797be	Choose nlp mode in config file	2021-01-17 22:56:33 +01:00
Eike Kettner	d712f8303d	Make glob matching case-insensitive by default	2021-01-09 13:23:15 +01:00
Eike Kettner	b08e88cd69	Add (inofficial) routes to get system information	2021-01-05 20:54:53 +01:00
Eike Kettner	668abf2140	Add a reset-password admin route	2021-01-04 20:59:31 +01:00
Eike Kettner	77627534bc	Improve on basic search summary	2020-12-15 23:37:02 +01:00
Eike Kettner	e3f6892abd	Convert job record	2020-12-15 21:03:46 +01:00
Eike Kettner	290989f67f	Reorder correspondent person suggestion based on org relationship	2020-12-01 23:39:45 +01:00
Eike Kettner	3fabe0a582	Update to Scala 2.13.4	2020-11-27 20:26:24 +01:00
Eike Kettner	5fe532001b	Allow to specify document lanugage with the request	2020-11-23 20:49:01 +01:00
Eike Kettner	7712e02d2d	Don't allow empty custom field values	2020-11-23 10:38:59 +01:00
Eike Kettner	93295d63a5	Change custom field values for a single item	2020-11-22 21:41:09 +01:00
Eike Kettner	62313ab03a	Add and change custom fields	2020-11-22 21:41:09 +01:00
Eike Kettner	248ad04dd0	Prepare custom fields	2020-11-22 21:41:09 +01:00
Eike Kettner	5034e12bec	Add a subject filter to scan-mailbox args	2020-11-13 23:15:20 +01:00
Eike Kettner	4fd6e02ec0	Improve glob and filter archive entries	2020-11-11 21:01:23 +01:00
Eike Kettner	55a6f7aaf6	Add more properties to upload meta data	2020-11-11 21:01:23 +01:00
Eike Kettner	a21a97f7d5	Add a simple glob data type	2020-11-10 22:44:08 +01:00
Eike Kettner	29455d638c	Add startup task to find page counts of existing files	2020-11-09 20:35:35 +01:00
Eike Kettner	f4e50c5229	Provide endpoints to submit tasks to re-generate previews The scaling factor can be given in the config file. When this changes, images can be regenerated via POSTing to certain endpoints. It is possible to regenerate just one attachment preview or all within a collective.	2020-11-09 09:00:02 +01:00
Eike Kettner	709848244c	Create tasks to generate all previews There is a task to generate preview images per attachment. It can either add them (if not present yet) or overwrite them (e.g. some config has changed). There is a task that selects all attachments without previews and submits a task to create it. This is submitted on start automatically to generate previews for all existing attachments.	2020-11-08 23:46:02 +01:00
Eike Kettner	ef7cb4e779	Create a preview image of all files during processing	2020-11-08 01:25:59 +01:00
Eike Kettner	0114bb4d72	Use source name from config file for integration endpoint uploads Fixes: #389	2020-10-26 22:37:30 +01:00
Eike Kettner	f6f63000be	Prepend a duplicate check when uploading files	2020-09-23 23:37:00 +02:00
Eike Kettner	d8bb6dcba3	Dynamically configure cookie and base-url When `base-url` is the default (i.e. localhost), the cookie is now configured with the domain doing the request and the webapp is configured to run requests against the host in the address bar of the browser.	2020-09-13 14:05:20 +02:00
Eike Kettner	c658677032	Autoformat	2020-09-09 00:29:32 +02:00
Eike Kettner	76ccfb8a81	Only learn from confirmed items Text classification should only learn from confirmed items. Log if classification is disabled when processing an item.	2020-09-07 13:04:40 +02:00
Eike Kettner	06879456a6	Change job priority on queue page	2020-09-05 18:50:58 +02:00
Eike Kettner	8c4f2e702b	Add classifier settings	2020-09-02 18:28:14 +02:00
Eike Kettner	3473cbb773	Use collective data with NER annotation	2020-08-25 20:40:44 +02:00
Eike Kettner	96d2f948f2	Use collective's addressbook to configure regexner	2020-08-24 14:40:52 +02:00
Eike Kettner	8628a0a8b3	Allow configuring stanford-ner and cache based on collective	2020-08-24 10:55:59 +02:00
Eike Kettner	fdb46da26d	Add french language and upgrade stanford-nlp to 4.0.0	2020-08-23 17:48:42 +02:00
Eike Kettner	3986487f11	Add api docs and cleanup	2020-08-13 21:22:54 +02:00
Eike Kettner	41ea071555	Add a task to convert all pdfs that have not been converted	2020-08-13 01:06:13 +02:00
Eike Kettner	07e9a9767e	Add a task to re-process files of an item	2020-08-12 22:29:56 +02:00
Eike Kettner	45b0deeced	Print solr url on start This is useful info to see which url has been selected, same as db connection.	2020-08-01 15:59:14 +02:00
Eike Kettner	5b01c93711	Add a folder-id to item processing This allows to define a folder when uploading files. All generated items are associated to this folder on creation.	2020-07-14 23:18:39 +02:00
Eike Kettner	347a029af8	Scalafix organize-imports	2020-06-28 21:20:47 +02:00
Eike Kettner	41c0f70d3b	Fix cancelling jobs A request to cancel a job was not processed correctly. The cancelling routine of a task must run, regardless of the (non-final) state. Now it works like this: if a job is currently running, it is interrupted and its cancel routine is invoked. It then enters "cancelled" state. If it is stuck, it is loaded and only its cancel routine is run. If it is in a final state or waiting, it is removed from the queue.	2020-06-26 23:08:27 +02:00
Eike Kettner	d79ae6233a	Restrict proposals for due date Avoid dates too far in the future.	2020-06-26 16:58:17 +02:00
Eike Kettner	15c0fb4395	Merge branch 'master' into fts	2020-06-23 00:32:27 +02:00
Eike Kettner	e06a3f8fdd	ScalafmtAll	2020-06-23 00:18:59 +02:00
Eike Kettner	0d8b03fc61	Add backend operations for re-creating the full-text index	2020-06-21 15:46:51 +02:00
Eike Kettner	7609b2b7c3	Run scalafmtAll	2020-06-20 23:03:51 +02:00
Eike Kettner	3576c45d1a	First basic working solr search	2020-06-20 02:18:49 +02:00
Eike Kettner	146d1b0562	Make data to index more flexible and extensible	2020-06-17 23:20:46 +02:00
Eike Kettner	897d91475e	Update scalafmt-core to 2.6.0	2020-06-17 19:53:56 +02:00
Eike Kettner	4b0eb650f2	Rename package to avoid name clashes	2020-05-25 16:22:09 +02:00
Eike Kettner	ee394eae86	Try streamline the different impls for `MimeType`	2020-05-25 09:24:24 +02:00
Eike Kettner	f4949446e3	Allow to specify an item id to amend files to existing items	2020-05-23 20:15:55 +02:00
Eike Kettner	25d089da6c	Update state and proposals only on invalid items Invalid items are those that are not ready, and not shown to the user. When changing metadata, it should only be changed, if the item was not already shown to the user.	2020-05-23 15:46:24 +02:00
Eike Kettner	f74f8e5198	Add new way for uploading files to any collective Applications running next to docspell may want a way to upload files to any collective for integration purposes. This endpoint can be used for this. It is disabled by default and can be enabled via the configuration file.	2020-05-23 14:29:24 +02:00
Eike Kettner	9f9dd6c0fb	Change routes for scan-mailbox task to allow multiple tasks per user	2020-05-21 22:04:45 +02:00
Eike Kettner	f2d67dc816	Initial impl of import from mailbox user task	2020-05-20 17:52:38 +02:00
Eike Kettner	6e8582ea80	Implement scan-mailbox form and routes	2020-05-20 17:52:38 +02:00
Eike Kettner	5d5311913c	Add ScanMailboxArgs	2020-05-20 17:52:38 +02:00
Eike Kettner	d65c1e0d36	Use date from e-mails to set item date	2020-05-17 11:58:51 +02:00
Eike Kettner	3e10e2175a	Sort by weights better and save them	2020-05-17 11:58:51 +02:00
Eike Kettner	c41cdeefec	Update scalafmt to 2.5.1 + scalafmtAll	2020-05-04 23:53:57 +02:00
Eike Kettner	0a1b3fcf95	Set list-id header for notification mails	2020-04-30 21:23:56 +02:00
Eike Kettner	75a66ecb86	Update http4s to 0.21.4	2020-04-29 01:05:13 +02:00
Eike Kettner	84e0ebf1a2	Add a flag for restricting overdue items	2020-04-23 21:37:03 +02:00
Eike Kettner	d52efdfcf0	Improve mail template	2020-04-22 23:41:09 +02:00
Eike Kettner	e1f9ae2629	Include links to items into mail template	2020-04-22 21:53:25 +02:00
Eike Kettner	2723d6b43b	Implement notify-due-items task	2020-04-22 21:08:45 +02:00
Eike Kettner	3524904faf	Add routes to check calendar events	2020-04-22 21:08:45 +02:00
Eike Kettner	ad772c0c25	Server-side stub impl for notify-due-items	2020-04-22 21:08:45 +02:00
Eike Kettner	1206105f0b	Fix several bugs with handling e-mail files - When converting from html->pdf, the wkhtmltopdf program exits with errors if the document contains invalid links. The content is now cleaned before handed to wkhtmltopdf. - Update emil library which fixes a bug when reading mails without explicit transfer encoding (8bit) - Add a info header to converted mails	2020-04-07 22:38:25 +02:00
Eike Kettner	14a25fe23e	Fix serializing mediatype parameters	2020-03-27 21:50:06 +01:00
Eike Kettner	9656ba62f4	scalafmtAll	2020-03-26 18:26:00 +01:00
Eike Kettner	0b80572664	Fix encodings for mails with non-utf8 html parts	2020-03-24 23:40:29 +01:00
Eike Kettner	cf7ccd572c	Improve handling encodings Html and text files are not fixed to be UTF-8. The encoding is now detected, which may not work for all files. Default/fallback will be utf-8. There is still a problem with mails that contain html parts not in utf8 encoding. The mail text is always returned as a string and the original encoding is lost. Then the html is stored using utf-8 bytes, but wkhtmltopdf reads it using latin1. It seems that the `--encoding` setting doesn't override encoding provided by the document.	2020-03-23 22:51:28 +01:00
Eike Kettner	cba466ed47	Set item due date candidate After processing, set the due date of an item to the first candidate. The earliest due date is considered best match.	2020-03-20 22:39:09 +01:00
Eike Kettner	6b1156182c	Add support for eml (rfc822 email) files	2020-03-19 22:42:40 +01:00
Eike Kettner	f0449dd2ce	Properly initialize thread pools	2020-03-17 22:37:12 +01:00
Eike Kettner	00ca6b5697	Improve text analysis - Search for consecutive labels - Sort list of candidates by a weight - Search for organizations using person labels	2020-03-17 22:34:50 +01:00
Eike Kettner	854a596da3	Integrate periodic tasks The first use case for periodic task is the cleanup of expired invitation keys. This is part of a house-keeping periodic task.	2020-03-08 22:49:49 +01:00
Eike Kettner	616c333fa5	Implement storage routines for periodic scheduler	2020-03-08 13:56:23 +01:00
Eike Kettner	1e598bd902	Sketch a scheduler for running periodic tasks Periodic tasks are special in that they are usually kept around and started based on a schedule. A new component checks periodic tasks and submits them in the queue once they are due. In order to avoid duplicate periodic jobs, the tracker of a job is used to store the periodic job id. Each time a periodic task is due, it is first checked if there is a job running (or queued) for this task.	2020-03-08 12:55:03 +01:00
Eike Kettner	2f87065b2e	sbt scalafmtAll	2020-02-25 20:55:00 +01:00
Eike Kettner	fbe0c1aec5	Allow more chars for mimetype	2020-02-20 00:39:31 +01:00
Eike Kettner	97305d27ff	Integrate support for more files into processing and upload The restriction that only pdf files can be uploaded is removed. All files can now be uploaded. The processing may not process all. It is still possible to restrict file uploads by types via a configuration.	2020-02-19 23:27:00 +01:00
Eike Kettner	9b1349734e	Convert some files to pdf	2020-02-19 02:03:10 +01:00
Eike Kettner	5869e2ee6e	Streamline extern-conv stdin/infile	2020-02-18 12:43:47 +01:00
Eike Kettner	0dcc00836b	Make logger configurable in system commands	2020-02-18 12:02:43 +01:00
Eike Kettner	bd605b8c94	Add first drafts for converting	2020-02-18 01:31:22 +01:00
Eike Kettner	e0682464b5	Configure pdf extraction; move Logger and DataType to common	2020-02-17 14:01:36 +01:00
Eike Kettner	3d615181e0	Early draft for text extraction	2020-02-17 01:57:22 +01:00
Eike Kettner	1309c8b7fa	Move mimetype detection to docspell-files	2020-02-14 22:06:18 +01:00
Eike Kettner	bf9bf25502	Rename example files	2020-02-14 11:10:54 +01:00
Eike Kettner	2c0425433e	Move File class to common module	2020-02-11 22:42:04 +01:00
Eike Kettner	3be90d64d5	Move `SystemCommand` to common module	2020-02-10 22:23:06 +01:00
Eike Kettner	ba3865ef5e	Starting to support more file types First, files are be converted to PDF for archiving. It is also easier to create a preview. This is done via the `ConvertPdf` processing task (which is not yet implemented). Text extraction then tries first with the original file. If that fails, OCR is done on the (potentially) converted pdf file. To not loose information of the original file, it is saved using the table `attachment_source`. If the original file is already a pdf, or the conversion did not succeed, the `attachment` and `attachment_source` record point to the same file.	2020-02-10 12:42:45 +01:00
Eike Kettner	5c37efeaba	Apply scalafmt to all files	2020-02-09 01:54:26 +01:00
Eike Kettner	88efe13209	Fix item route responses Also avoid storing empty strings in a nullable field.	2020-01-11 12:58:04 +01:00
Eike Kettner	4490a444a9	Allow dots in identifiers	2020-01-07 00:20:41 +01:00
Eike Kettner	9020d9aa3b	Don't require a prefix when configuring byte arrays	2020-01-05 15:29:58 +01:00
Eike Kettner	8814de3c38	Allow simple search when listing meta data	2020-01-02 20:21:49 +01:00
Eike Kettner	fc3e22e399	Apply scalafmt to all files	2019-12-30 21:44:13 +01:00
Eike Kettner	a9e70401de	Update dependencies	2019-12-28 12:38:11 +01:00
Eike Kettner	07a23b9611	Fix percent encoding Must use utf8 bytes, of course…	2019-12-11 21:56:31 +01:00
Eike Kettner	2ad1586d00	Set stricter compile options and fix cookie data	2019-09-28 22:17:45 +02:00
Eike Kettner	831cd8b655	Initial version. Features: - Upload PDF files let them analyze - Manage meta data and items - See processing in webapp	2019-09-21 22:02:36 +02:00

1 2 3 4 5

211 Commits