2020-03-19 21:42:58 +00:00
|
|
|
---
|
|
|
|
layout: docs
|
|
|
|
title: Archive Files
|
2020-03-28 15:35:28 +00:00
|
|
|
permalink: dev/adr/0013_archive_files
|
2020-03-19 21:42:58 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
# {{ page.title }}
|
|
|
|
|
|
|
|
|
|
|
|
## Context and Problem Statement
|
|
|
|
|
|
|
|
Docspell should have support for files that contain the actual files
|
|
|
|
that matter, like zip files and other such things. It should extract
|
|
|
|
its contents automatcially.
|
|
|
|
|
|
|
|
Since docspell should never drop or modify user data, the archive file
|
|
|
|
must be present in the database. And it must be possible to download
|
|
|
|
the file unmodified.
|
|
|
|
|
|
|
|
On the other hand, files in there need to be text analysed and
|
|
|
|
converted to pdf files.
|
|
|
|
|
|
|
|
## Decision Outcome
|
|
|
|
|
|
|
|
There is currently a table `attachment_source` which holds references
|
|
|
|
to "original" files. These are the files as uploaded by the user,
|
|
|
|
before converted to pdf. Archive files add a subtlety to this: in case
|
|
|
|
of an archive, an `attachment_source` is the original (non-archive)
|
|
|
|
file inside an archive.
|
|
|
|
|
|
|
|
The archive file itself will be stored in a separate table `attachment_archive`.
|
|
|
|
|
|
|
|
Example: uploading a `files.zip` ZIP file containing `report.jpg`:
|
|
|
|
|
|
|
|
- `attachment_source`: report.jpg
|
|
|
|
- `attachment`: report.pdf
|
|
|
|
- `attachment_archive`: files.zip
|
|
|
|
|
|
|
|
Archive may contain other archives. Then the inner archives will not
|
|
|
|
be saved. The archive file is extracted recursively, until there is no
|
|
|
|
known archive file found.
|
|
|
|
|
|
|
|
## Initial Support
|
|
|
|
|
|
|
|
Initial support is implemented for ZIP and EML (e-mail files) files.
|