Commit Graph

24 Commits

Author SHA1 Message Date
Rehan Mahmood
2a39b2f6a6 Updated following dependencies as they need changes to the code to work properly:
- Scala
- fs2
- http4s
2023-10-31 14:24:00 -04:00
eikek
6cef9d4f07 Improve performance of zip/unzip
Adds tests and includes some cleanup
2022-06-18 16:39:57 +02:00
eikek
7fdd78ad06 Experiment with addons
Addons allow to execute external programs in some context inside
docspell. Currently it is possible to run them after processing files.
Addons are provided by URLs to zip files.
2022-05-15 23:46:43 +02:00
eikek
4488291319 Download multiple files as zip 2022-04-09 15:28:51 +02:00
eikek
071f4067bf Use existing mimetype detection when storing files 2021-09-23 14:10:24 +02:00
eikek
1761526e20 Simplify MimeType class and parse mimetypes in a more lenient way 2021-09-23 14:10:24 +02:00
eikek
9013f2de5b Update scalafmt settings 2021-09-22 17:23:24 +02:00
eikek
9785db0683 Change license header of all files 2021-09-21 22:35:38 +02:00
Scala Steward
e4fecefaea
Reformat with scalafmt 3.0.0 2021-08-19 08:50:30 +02:00
eikek
1901fe1a8c Adopt deprecated APIs from fs2; use fs2.Path 2021-08-07 17:51:56 +02:00
Scala Steward
558007235b Update tika-core to 2.0.0
Include new ODF parser from tika-2.0.0
2021-07-25 13:08:18 +02:00
eikek
8e5c88fd32 Add copyright header to source files 2021-07-04 10:57:53 +02:00
eikek
bd791b4593 Upgrade code base to CE3 2021-06-22 22:53:34 +02:00
Eike Kettner
4fd6e02ec0 Improve glob and filter archive entries 2020-11-11 21:01:23 +01:00
Eike Kettner
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
Eike Kettner
c41cdeefec Update scalafmt to 2.5.1 + scalafmtAll 2020-05-04 23:53:57 +02:00
Eike Kettner
cf7ccd572c Improve handling encodings
Html and text files are not fixed to be UTF-8. The encoding is now
detected, which may not work for all files. Default/fallback will be
utf-8.

There is still a problem with mails that contain html parts not in
utf8 encoding. The mail text is always returned as a string and the
original encoding is lost. Then the html is stored using utf-8 bytes,
but wkhtmltopdf reads it using latin1. It seems that the `--encoding`
setting doesn't override encoding provided by the document.
2020-03-23 22:51:28 +01:00
Eike Kettner
6b1156182c Add support for eml (rfc822 email) files 2020-03-19 22:42:40 +01:00
Eike Kettner
4ed7a137f7 Add support for archive files
Each attachment is now first extracted into potentially multiple ones,
if it is recognized as an archive. This is the first step in
processing. The original archive file is also stored and the resulting
attachments are associated to their original archive.

First support is implemented for zip files.
2020-03-19 22:42:27 +01:00
Eike Kettner
2f87065b2e sbt scalafmtAll 2020-02-25 20:55:00 +01:00
Eike Kettner
9b1349734e Convert some files to pdf 2020-02-19 02:03:10 +01:00
Eike Kettner
e0682464b5 Configure pdf extraction; move Logger and DataType to common 2020-02-17 14:01:36 +01:00
Eike Kettner
8143a4edcc Adding extraction primitives 2020-02-16 21:37:26 +01:00
Eike Kettner
1309c8b7fa Move mimetype detection to docspell-files 2020-02-14 22:06:18 +01:00