Commit Graph

19 Commits

Author SHA1 Message Date
e26d7129e7 Add fix for mariadb text columns
The `text` data type can only store up to 64kb data. The `mediumtext`
up to 16M and `longtext` up to 4G.

Issue: #297
2020-10-02 16:50:51 +02:00
c658677032 Autoformat 2020-09-09 00:29:32 +02:00
0599176ae8 Update scala to 2.13.3 2020-08-01 01:03:43 +02:00
da68405f9b Extract meta data from pdfs using pdfbox 2020-07-18 23:04:46 +02:00
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
c41cdeefec Update scalafmt to 2.5.1 + scalafmtAll 2020-05-04 23:53:57 +02:00
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
cf7ccd572c Improve handling encodings
Html and text files are not fixed to be UTF-8. The encoding is now
detected, which may not work for all files. Default/fallback will be
utf-8.

There is still a problem with mails that contain html parts not in
utf8 encoding. The mail text is always returned as a string and the
original encoding is lost. Then the html is stored using utf-8 bytes,
but wkhtmltopdf reads it using latin1. It seems that the `--encoding`
setting doesn't override encoding provided by the document.
2020-03-23 22:51:28 +01:00
6b1156182c Add support for eml (rfc822 email) files 2020-03-19 22:42:40 +01:00
4ed7a137f7 Add support for archive files
Each attachment is now first extracted into potentially multiple ones,
if it is recognized as an archive. This is the first step in
processing. The original archive file is also stored and the resulting
attachments are associated to their original archive.

First support is implemented for zip files.
2020-03-19 22:42:27 +01:00
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
9b1349734e Convert some files to pdf 2020-02-19 02:03:10 +01:00
bd605b8c94 Add first drafts for converting 2020-02-18 01:31:22 +01:00
e0682464b5 Configure pdf extraction; move Logger and DataType to common 2020-02-17 14:01:36 +01:00
3d615181e0 Early draft for text extraction 2020-02-17 01:57:22 +01:00
8143a4edcc Adding extraction primitives 2020-02-16 21:37:26 +01:00
3deba44282 Rename example files 2020-02-15 12:52:24 +01:00
1309c8b7fa Move mimetype detection to docspell-files 2020-02-14 22:06:18 +01:00
5c3d2b2e28 Rename example-files to files 2020-02-14 11:14:09 +01:00