mirror of
https://github.com/TheAnachronism/docspell.git
synced 2024-11-13 02:31:10 +00:00
45 lines
1.3 KiB
Markdown
45 lines
1.3 KiB
Markdown
|
---
|
||
|
layout: docs
|
||
|
title: Archive Files
|
||
|
---
|
||
|
|
||
|
# {{ page.title }}
|
||
|
|
||
|
|
||
|
## Context and Problem Statement
|
||
|
|
||
|
Docspell should have support for files that contain the actual files
|
||
|
that matter, like zip files and other such things. It should extract
|
||
|
its contents automatcially.
|
||
|
|
||
|
Since docspell should never drop or modify user data, the archive file
|
||
|
must be present in the database. And it must be possible to download
|
||
|
the file unmodified.
|
||
|
|
||
|
On the other hand, files in there need to be text analysed and
|
||
|
converted to pdf files.
|
||
|
|
||
|
## Decision Outcome
|
||
|
|
||
|
There is currently a table `attachment_source` which holds references
|
||
|
to "original" files. These are the files as uploaded by the user,
|
||
|
before converted to pdf. Archive files add a subtlety to this: in case
|
||
|
of an archive, an `attachment_source` is the original (non-archive)
|
||
|
file inside an archive.
|
||
|
|
||
|
The archive file itself will be stored in a separate table `attachment_archive`.
|
||
|
|
||
|
Example: uploading a `files.zip` ZIP file containing `report.jpg`:
|
||
|
|
||
|
- `attachment_source`: report.jpg
|
||
|
- `attachment`: report.pdf
|
||
|
- `attachment_archive`: files.zip
|
||
|
|
||
|
Archive may contain other archives. Then the inner archives will not
|
||
|
be saved. The archive file is extracted recursively, until there is no
|
||
|
known archive file found.
|
||
|
|
||
|
## Initial Support
|
||
|
|
||
|
Initial support is implemented for ZIP and EML (e-mail files) files.
|