docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-06-24 03:18:26 +00:00

Author	SHA1	Message	Date
eikek	8269a73a83	Extend config for external commands (#2536 ) Allows to configure external commands and provide different arguments based on runtime values, like language. It extends the current config of a command to allow a `arg-mappings` section. An example for ocrmypdf: ```conf ocrmypdf = { enabled = true command = { program = "ocrmypdf" ### new arg-mappings arg-mappings = { "mylang" = { value = "{{lang}}" mappings = [ { matches = "deu" args = [ "-l", "deu", "--pdf-renderer", "sandwich" ] }, { matches = ".*" args = [ "-l", "{{lang}}" ] } ] } } #### end new arg-mappings args = [ ### will be replaced with corresponding args from "mylang" mapping "{{mylang}}", "--skip-text", "--deskew", "-j", "1", "{{infile}}", "{{outfile}}" ] timeout = "5 minutes" } working-dir = ${java.io.tmpdir}"/docspell-convert" } ``` The whole section will be first processed to replace all `{{…}}` patterns with corresponding values. Then `arg-mappings` will be looked at and the first match (value == matches) in its `mappings` array is used to replace its name in the arguments to the command.	2024-03-08 21:34:42 +01:00
eikek	924aaf720e	Fix compile warnings after scala update	2024-03-03 18:43:54 +01:00
eikek	dd763e7796	Fix potential infinite loop The code removed here was copied from another project some years back. Now there is an improved version in fs2 that can be used. Fixes: #2376	2023-11-12 13:04:03 +01:00
eikek	fe4a300b0e	Update pdfbox to 3.0.0	2023-11-06 00:06:49 +01:00
Rehan Mahmood	2a39b2f6a6	Updated following dependencies as they need changes to the code to work properly: - Scala - fs2 - http4s	2023-10-31 14:24:00 -04:00
eikek	85094cc1f6	Fix html conversion for text files It must honor the configuration when doing html->pdf.	2023-01-09 18:17:23 +01:00
eikek	df75fbddcd	Allow to convert html->pdf via weasyprint	2022-11-07 10:31:25 +01:00
eikek	7fdd78ad06	Experiment with addons Addons allow to execute external programs in some context inside docspell. Currently it is possible to run them after processing files. Addons are provided by URLs to zip files.	2022-05-15 23:46:43 +02:00
eikek	9eb9497675	Fix logging in tests	2022-02-19 23:33:01 +01:00
eikek	e483a97de7	Adopt to new loggin api	2022-02-19 21:41:38 +01:00
eikek	aa8f3b82fc	Use passwords when reading PDFs	2021-09-30 11:48:59 +02:00
eikek	3c93b63c8a	Add option to decrypt PDFs during conversion Refs: #1074	2021-09-29 23:04:26 +02:00
eikek	9013f2de5b	Update scalafmt settings	2021-09-22 17:23:24 +02:00
eikek	9785db0683	Change license header of all files	2021-09-21 22:35:38 +02:00
Scala Steward	e4fecefaea	Reformat with scalafmt 3.0.0	2021-08-19 08:50:30 +02:00
eikek	1901fe1a8c	Adopt deprecated APIs from fs2; use fs2.Path	2021-08-07 17:51:56 +02:00
eikek	8e5c88fd32	Add copyright header to source files	2021-07-04 10:57:53 +02:00
eikek	bd791b4593	Upgrade code base to CE3	2021-06-22 22:53:34 +02:00
Eike Kettner	e1bbc2edf5	Apply autoformat	2021-04-10 16:31:58 +02:00
Eike Kettner	6a63694a3e	Convert unit tests to munit	2021-03-10 19:48:56 +01:00
Eike Kettner	3fabe0a582	Update to Scala 2.13.4	2020-11-27 20:26:24 +01:00
Eike Kettner	6db5c39d78	Fix converted filename Mark it by default with a string from the config file. Issue: 397	2020-11-08 09:45:03 +01:00
Eike Kettner	dd89e05cc2	Convert exceptions when converting to pdf into an error result The file processing tries pdf conversion once and keeps going if it fails. Some errors (e.g. timeouts) are raised via an exception. Issue: #387	2020-10-26 19:51:02 +01:00
Eike Kettner	c658677032	Autoformat	2020-09-09 00:29:32 +02:00
Eike Kettner	0599176ae8	Update scala to 2.13.3	2020-08-01 01:03:43 +02:00
Eike Kettner	3d49ceaab5	Use ocrmypdf tool to create pdf/a during conversion - Use another external tool to convert pdf to pdf which also adds the extracted text as another layer into the pdf - Although not used, the external conversion routine will now check for an existing text file that is named as the pdf file with extension `.txt`. If present it is included in the conversion result and will be used as the extracted text. - text extraction for pdf files happens now on the converted file, because it may already contain the text from the conversion step and thus avoids running OCR twice. - All errors during conversion are not fatal; processing continues without a converted file.	2020-07-18 17:19:29 +02:00
Eike Kettner	347a029af8	Scalafix organize-imports	2020-06-28 21:20:47 +02:00
Eike Kettner	56624515a5	ScalafmtAll	2020-05-25 13:56:06 +02:00
Eike Kettner	ee394eae86	Try streamline the different impls for `MimeType`	2020-05-25 09:24:24 +02:00
Eike Kettner	c41cdeefec	Update scalafmt to 2.5.1 + scalafmtAll	2020-05-04 23:53:57 +02:00
Eike Kettner	b2ca314da9	Check code formatting with travis ci	2020-04-23 20:25:21 +02:00
Eike Kettner	362e1a5e14	Fix compile errors in test code	2020-04-07 23:00:25 +02:00
Eike Kettner	1206105f0b	Fix several bugs with handling e-mail files - When converting from html->pdf, the wkhtmltopdf program exits with errors if the document contains invalid links. The content is now cleaned before handed to wkhtmltopdf. - Update emil library which fixes a bug when reading mails without explicit transfer encoding (8bit) - Add a info header to converted mails	2020-04-07 22:38:25 +02:00
Eike Kettner	aed5dfaff6	Fix mimetype extractors	2020-03-27 21:49:55 +01:00
Eike Kettner	9656ba62f4	scalafmtAll	2020-03-26 18:26:00 +01:00
Eike Kettner	cf7ccd572c	Improve handling encodings Html and text files are not fixed to be UTF-8. The encoding is now detected, which may not work for all files. Default/fallback will be utf-8. There is still a problem with mails that contain html parts not in utf8 encoding. The mail text is always returned as a string and the original encoding is lost. Then the html is stored using utf-8 bytes, but wkhtmltopdf reads it using latin1. It seems that the `--encoding` setting doesn't override encoding provided by the document.	2020-03-23 22:51:28 +01:00
Eike Kettner	3703dce9a6	Update fs2 to 2.3.0	2020-03-20 22:47:09 +01:00
Eike Kettner	2f87065b2e	sbt scalafmtAll	2020-02-25 20:55:00 +01:00
Eike Kettner	ec419c7bfd	Adopt nix modules to new config	2020-02-22 12:40:56 +01:00
Eike Kettner	97305d27ff	Integrate support for more files into processing and upload The restriction that only pdf files can be uploaded is removed. All files can now be uploaded. The processing may not process all. It is still possible to restrict file uploads by types via a configuration.	2020-02-19 23:27:00 +01:00
Eike Kettner	9b1349734e	Convert some files to pdf	2020-02-19 02:03:10 +01:00
Eike Kettner	5869e2ee6e	Streamline extern-conv stdin/infile	2020-02-18 12:43:47 +01:00
Eike Kettner	0dcc00836b	Make logger configurable in system commands	2020-02-18 12:02:43 +01:00
Eike Kettner	bd605b8c94	Add first drafts for converting	2020-02-18 01:31:22 +01:00
Eike Kettner	c665c212a0	Early draft for running wkhtmltopdf	2020-02-17 14:02:23 +01:00
Eike Kettner	8143a4edcc	Adding extraction primitives	2020-02-16 21:37:26 +01:00
Eike Kettner	ce22b727b1	Add new convert module and sketch its integration	2020-02-11 00:33:52 +01:00

47 Commits