docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-06-21 18:08:25 +00:00

Author	SHA1	Message	Date
Eike Kettner	0599176ae8	Update scala to 2.13.3	2020-08-01 01:03:43 +02:00
Eike Kettner	3d49ceaab5	Use ocrmypdf tool to create pdf/a during conversion - Use another external tool to convert pdf to pdf which also adds the extracted text as another layer into the pdf - Although not used, the external conversion routine will now check for an existing text file that is named as the pdf file with extension `.txt`. If present it is included in the conversion result and will be used as the extracted text. - text extraction for pdf files happens now on the converted file, because it may already contain the text from the conversion step and thus avoids running OCR twice. - All errors during conversion are not fatal; processing continues without a converted file.	2020-07-18 17:19:29 +02:00
Eike Kettner	347a029af8	Scalafix organize-imports	2020-06-28 21:20:47 +02:00
Eike Kettner	56624515a5	ScalafmtAll	2020-05-25 13:56:06 +02:00
Eike Kettner	ee394eae86	Try streamline the different impls for `MimeType`	2020-05-25 09:24:24 +02:00
Eike Kettner	c41cdeefec	Update scalafmt to 2.5.1 + scalafmtAll	2020-05-04 23:53:57 +02:00
Eike Kettner	b2ca314da9	Check code formatting with travis ci	2020-04-23 20:25:21 +02:00
Eike Kettner	362e1a5e14	Fix compile errors in test code	2020-04-07 23:00:25 +02:00
Eike Kettner	1206105f0b	Fix several bugs with handling e-mail files - When converting from html->pdf, the wkhtmltopdf program exits with errors if the document contains invalid links. The content is now cleaned before handed to wkhtmltopdf. - Update emil library which fixes a bug when reading mails without explicit transfer encoding (8bit) - Add a info header to converted mails	2020-04-07 22:38:25 +02:00
Eike Kettner	aed5dfaff6	Fix mimetype extractors	2020-03-27 21:49:55 +01:00
Eike Kettner	9656ba62f4	scalafmtAll	2020-03-26 18:26:00 +01:00
Eike Kettner	cf7ccd572c	Improve handling encodings Html and text files are not fixed to be UTF-8. The encoding is now detected, which may not work for all files. Default/fallback will be utf-8. There is still a problem with mails that contain html parts not in utf8 encoding. The mail text is always returned as a string and the original encoding is lost. Then the html is stored using utf-8 bytes, but wkhtmltopdf reads it using latin1. It seems that the `--encoding` setting doesn't override encoding provided by the document.	2020-03-23 22:51:28 +01:00
Eike Kettner	3703dce9a6	Update fs2 to 2.3.0	2020-03-20 22:47:09 +01:00
Eike Kettner	2f87065b2e	sbt scalafmtAll	2020-02-25 20:55:00 +01:00
Eike Kettner	ec419c7bfd	Adopt nix modules to new config	2020-02-22 12:40:56 +01:00
Eike Kettner	97305d27ff	Integrate support for more files into processing and upload The restriction that only pdf files can be uploaded is removed. All files can now be uploaded. The processing may not process all. It is still possible to restrict file uploads by types via a configuration.	2020-02-19 23:27:00 +01:00
Eike Kettner	9b1349734e	Convert some files to pdf	2020-02-19 02:03:10 +01:00
Eike Kettner	5869e2ee6e	Streamline extern-conv stdin/infile	2020-02-18 12:43:47 +01:00
Eike Kettner	0dcc00836b	Make logger configurable in system commands	2020-02-18 12:02:43 +01:00
Eike Kettner	bd605b8c94	Add first drafts for converting	2020-02-18 01:31:22 +01:00
Eike Kettner	c665c212a0	Early draft for running wkhtmltopdf	2020-02-17 14:02:23 +01:00
Eike Kettner	8143a4edcc	Adding extraction primitives	2020-02-16 21:37:26 +01:00
Eike Kettner	ce22b727b1	Add new convert module and sketch its integration	2020-02-11 00:33:52 +01:00

23 Commits