docspell/website/site/content/docs/dev/adr/0013_archive_files.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

43 lines
1.3 KiB
Markdown
Raw Normal View History

2020-07-27 20:13:22 +00:00
+++
title = "Archive Files"
weight = 140
+++
# Context and Problem Statement
Docspell should have support for files that contain the actual files
that matter, like zip files and other such things. It should extract
its contents automatcially.
Since docspell should never drop or modify user data, the archive file
must be present in the database. And it must be possible to download
the file unmodified.
On the other hand, files in there need to be text analysed and
converted to pdf files.
# Decision Outcome
There is currently a table `attachment_source` which holds references
to "original" files. These are the files as uploaded by the user,
before converted to pdf. Archive files add a subtlety to this: in case
of an archive, an `attachment_source` is the original (non-archive)
file inside an archive.
The archive file itself will be stored in a separate table `attachment_archive`.
Example: uploading a `files.zip` ZIP file containing `report.jpg`:
- `attachment_source`: report.jpg
- `attachment`: report.pdf
- `attachment_archive`: files.zip
Archive may contain other archives. Then the inner archives will not
be saved. The archive file is extracted recursively, until there is no
known archive file found.
# Initial Support
Initial support is implemented for ZIP and EML (e-mail files) files.