Commit Graph

12 Commits

Author SHA1 Message Date
7fdd78ad06 Experiment with addons
Addons allow to execute external programs in some context inside
docspell. Currently it is possible to run them after processing files.
Addons are provided by URLs to zip files.
2022-05-15 23:46:43 +02:00
8b42708db2 Remove old log stuff 2022-02-19 22:01:49 +01:00
0b606e6b05 Use logfmt for log lines and remove ansi color codes 2021-12-19 22:29:56 +01:00
3c93b63c8a Add option to decrypt PDFs during conversion
Refs: #1074
2021-09-29 23:04:26 +02:00
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
e26d7129e7 Add fix for mariadb text columns
The `text` data type can only store up to 64kb data. The `mediumtext`
up to 16M and `longtext` up to 4G.

Issue: #297
2020-10-02 16:50:51 +02:00
da68405f9b Extract meta data from pdfs using pdfbox 2020-07-18 23:04:46 +02:00
4ed7a137f7 Add support for archive files
Each attachment is now first extracted into potentially multiple ones,
if it is recognized as an archive. This is the first step in
processing. The original archive file is also stored and the resulting
attachments are associated to their original archive.

First support is implemented for zip files.
2020-03-19 22:42:27 +01:00
bd605b8c94 Add first drafts for converting 2020-02-18 01:31:22 +01:00
8143a4edcc Adding extraction primitives 2020-02-16 21:37:26 +01:00
3deba44282 Rename example files 2020-02-15 12:52:24 +01:00
5c3d2b2e28 Rename example-files to files 2020-02-14 11:14:09 +01:00