Commit Graph

209 Commits

Author SHA1 Message Date
Eike Kettner
3576c45d1a First basic working solr search 2020-06-20 02:18:49 +02:00
Eike Kettner
146d1b0562 Make data to index more flexible and extensible 2020-06-17 23:20:46 +02:00
Eike Kettner
897d91475e Update scalafmt-core to 2.6.0 2020-06-17 19:53:56 +02:00
Eike Kettner
4b0eb650f2 Rename package to avoid name clashes 2020-05-25 16:22:09 +02:00
Eike Kettner
ee394eae86 Try streamline the different impls for MimeType 2020-05-25 09:24:24 +02:00
Eike Kettner
f4949446e3 Allow to specify an item id to amend files to existing items 2020-05-23 20:15:55 +02:00
Eike Kettner
25d089da6c Update state and proposals only on invalid items
Invalid items are those that are not ready, and not shown to the user.
When changing metadata, it should only be changed, if the item was not
already shown to the user.
2020-05-23 15:46:24 +02:00
Eike Kettner
f74f8e5198 Add new way for uploading files to any collective
Applications running next to docspell may want a way to upload files
to any collective for integration purposes. This endpoint can be used
for this. It is disabled by default and can be enabled via the
configuration file.
2020-05-23 14:29:24 +02:00
Eike Kettner
9f9dd6c0fb Change routes for scan-mailbox task to allow multiple tasks per user 2020-05-21 22:04:45 +02:00
Eike Kettner
f2d67dc816 Initial impl of import from mailbox user task 2020-05-20 17:52:38 +02:00
Eike Kettner
6e8582ea80 Implement scan-mailbox form and routes 2020-05-20 17:52:38 +02:00
Eike Kettner
5d5311913c Add ScanMailboxArgs 2020-05-20 17:52:38 +02:00
Eike Kettner
d65c1e0d36 Use date from e-mails to set item date 2020-05-17 11:58:51 +02:00
Eike Kettner
3e10e2175a Sort by weights better and save them 2020-05-17 11:58:51 +02:00
Eike Kettner
c41cdeefec Update scalafmt to 2.5.1 + scalafmtAll 2020-05-04 23:53:57 +02:00
Eike Kettner
0a1b3fcf95 Set list-id header for notification mails 2020-04-30 21:23:56 +02:00
Eike Kettner
75a66ecb86 Update http4s to 0.21.4 2020-04-29 01:05:13 +02:00
Eike Kettner
84e0ebf1a2 Add a flag for restricting overdue items 2020-04-23 21:37:03 +02:00
Eike Kettner
d52efdfcf0 Improve mail template 2020-04-22 23:41:09 +02:00
Eike Kettner
e1f9ae2629 Include links to items into mail template 2020-04-22 21:53:25 +02:00
Eike Kettner
2723d6b43b Implement notify-due-items task 2020-04-22 21:08:45 +02:00
Eike Kettner
3524904faf Add routes to check calendar events 2020-04-22 21:08:45 +02:00
Eike Kettner
ad772c0c25 Server-side stub impl for notify-due-items 2020-04-22 21:08:45 +02:00
Eike Kettner
1206105f0b Fix several bugs with handling e-mail files
- When converting from html->pdf, the wkhtmltopdf program exits with
  errors if the document contains invalid links. The content is now
  cleaned before handed to wkhtmltopdf.
- Update emil library which fixes a bug when reading mails without
  explicit transfer encoding (8bit)
- Add a info header to converted mails
2020-04-07 22:38:25 +02:00
Eike Kettner
14a25fe23e Fix serializing mediatype parameters 2020-03-27 21:50:06 +01:00
Eike Kettner
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
Eike Kettner
0b80572664 Fix encodings for mails with non-utf8 html parts 2020-03-24 23:40:29 +01:00
Eike Kettner
cf7ccd572c Improve handling encodings
Html and text files are not fixed to be UTF-8. The encoding is now
detected, which may not work for all files. Default/fallback will be
utf-8.

There is still a problem with mails that contain html parts not in
utf8 encoding. The mail text is always returned as a string and the
original encoding is lost. Then the html is stored using utf-8 bytes,
but wkhtmltopdf reads it using latin1. It seems that the `--encoding`
setting doesn't override encoding provided by the document.
2020-03-23 22:51:28 +01:00
Eike Kettner
cba466ed47 Set item due date candidate
After processing, set the due date of an item to the first candidate.
The earliest due date is considered best match.
2020-03-20 22:39:09 +01:00
Eike Kettner
6b1156182c Add support for eml (rfc822 email) files 2020-03-19 22:42:40 +01:00
Eike Kettner
f0449dd2ce Properly initialize thread pools 2020-03-17 22:37:12 +01:00
Eike Kettner
00ca6b5697 Improve text analysis
- Search for consecutive labels

- Sort list of candidates by a weight

- Search for organizations using person labels
2020-03-17 22:34:50 +01:00
Eike Kettner
854a596da3 Integrate periodic tasks
The first use case for periodic task is the cleanup of expired
invitation keys. This is part of a house-keeping periodic task.
2020-03-08 22:49:49 +01:00
Eike Kettner
616c333fa5 Implement storage routines for periodic scheduler 2020-03-08 13:56:23 +01:00
Eike Kettner
1e598bd902 Sketch a scheduler for running periodic tasks
Periodic tasks are special in that they are usually kept around and
started based on a schedule. A new component checks periodic tasks and
submits them in the queue once they are due.

In order to avoid duplicate periodic jobs, the tracker of a job is
used to store the periodic job id. Each time a periodic task is due,
it is first checked if there is a job running (or queued) for this
task.
2020-03-08 12:55:03 +01:00
Eike Kettner
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
Eike Kettner
fbe0c1aec5 Allow more chars for mimetype 2020-02-20 00:39:31 +01:00
Eike Kettner
97305d27ff Integrate support for more files into processing and upload
The restriction that only pdf files can be uploaded is removed. All
files can now be uploaded. The processing may not process all. It is
still possible to restrict file uploads by types via a configuration.
2020-02-19 23:27:00 +01:00
Eike Kettner
9b1349734e Convert some files to pdf 2020-02-19 02:03:10 +01:00
Eike Kettner
5869e2ee6e Streamline extern-conv stdin/infile 2020-02-18 12:43:47 +01:00
Eike Kettner
0dcc00836b Make logger configurable in system commands 2020-02-18 12:02:43 +01:00
Eike Kettner
bd605b8c94 Add first drafts for converting 2020-02-18 01:31:22 +01:00
Eike Kettner
e0682464b5 Configure pdf extraction; move Logger and DataType to common 2020-02-17 14:01:36 +01:00
Eike Kettner
3d615181e0 Early draft for text extraction 2020-02-17 01:57:22 +01:00
Eike Kettner
1309c8b7fa Move mimetype detection to docspell-files 2020-02-14 22:06:18 +01:00
Eike Kettner
bf9bf25502 Rename example files 2020-02-14 11:10:54 +01:00
Eike Kettner
2c0425433e Move File class to common module 2020-02-11 22:42:04 +01:00
Eike Kettner
3be90d64d5 Move SystemCommand to common module 2020-02-10 22:23:06 +01:00
Eike Kettner
ba3865ef5e Starting to support more file types
First, files are be converted to PDF for archiving. It is also easier
to create a preview. This is done via the `ConvertPdf` processing
task (which is not yet implemented).

Text extraction then tries first with the original file. If that
fails, OCR is done on the (potentially) converted pdf file.

To not loose information of the original file, it is saved using the
table `attachment_source`. If the original file is already a pdf, or
the conversion did not succeed, the `attachment` and
`attachment_source` record point to the same file.
2020-02-10 12:42:45 +01:00
Eike Kettner
5c37efeaba Apply scalafmt to all files 2020-02-09 01:54:26 +01:00
Eike Kettner
88efe13209 Fix item route responses
Also avoid storing empty strings in a nullable field.
2020-01-11 12:58:04 +01:00
Eike Kettner
4490a444a9 Allow dots in identifiers 2020-01-07 00:20:41 +01:00
Eike Kettner
9020d9aa3b Don't require a prefix when configuring byte arrays 2020-01-05 15:29:58 +01:00
Eike Kettner
8814de3c38 Allow simple search when listing meta data 2020-01-02 20:21:49 +01:00
Eike Kettner
fc3e22e399 Apply scalafmt to all files 2019-12-30 21:44:13 +01:00
Eike Kettner
a9e70401de Update dependencies 2019-12-28 12:38:11 +01:00
Eike Kettner
07a23b9611 Fix percent encoding
Must use utf8 bytes, of course…
2019-12-11 21:56:31 +01:00
Eike Kettner
2ad1586d00 Set stricter compile options and fix cookie data 2019-09-28 22:17:45 +02:00
Eike Kettner
831cd8b655 Initial version.
Features:

- Upload PDF files let them analyze

- Manage meta data and items

- See processing in webapp
2019-09-21 22:02:36 +02:00