Invalid items are those that are not ready, and not shown to the user.
When changing metadata, it should only be changed, if the item was not
already shown to the user.
Applications running next to docspell may want a way to upload files
to any collective for integration purposes. This endpoint can be used
for this. It is disabled by default and can be enabled via the
configuration file.
- When converting from html->pdf, the wkhtmltopdf program exits with
errors if the document contains invalid links. The content is now
cleaned before handed to wkhtmltopdf.
- Update emil library which fixes a bug when reading mails without
explicit transfer encoding (8bit)
- Add a info header to converted mails
Html and text files are not fixed to be UTF-8. The encoding is now
detected, which may not work for all files. Default/fallback will be
utf-8.
There is still a problem with mails that contain html parts not in
utf8 encoding. The mail text is always returned as a string and the
original encoding is lost. Then the html is stored using utf-8 bytes,
but wkhtmltopdf reads it using latin1. It seems that the `--encoding`
setting doesn't override encoding provided by the document.
Periodic tasks are special in that they are usually kept around and
started based on a schedule. A new component checks periodic tasks and
submits them in the queue once they are due.
In order to avoid duplicate periodic jobs, the tracker of a job is
used to store the periodic job id. Each time a periodic task is due,
it is first checked if there is a job running (or queued) for this
task.
The restriction that only pdf files can be uploaded is removed. All
files can now be uploaded. The processing may not process all. It is
still possible to restrict file uploads by types via a configuration.
First, files are be converted to PDF for archiving. It is also easier
to create a preview. This is done via the `ConvertPdf` processing
task (which is not yet implemented).
Text extraction then tries first with the original file. If that
fails, OCR is done on the (potentially) converted pdf file.
To not loose information of the original file, it is saved using the
table `attachment_source`. If the original file is already a pdf, or
the conversion did not succeed, the `attachment` and
`attachment_source` record point to the same file.