mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-21 18:08:25 +00:00
Update documentation
This commit is contained in:
@ -15,7 +15,8 @@ immediately – as long as there are enough resource.
|
|||||||
What is missing, is a component that maintains periodic tasks. The
|
What is missing, is a component that maintains periodic tasks. The
|
||||||
reason for this is to have house keeping tasks that run regularily and
|
reason for this is to have house keeping tasks that run regularily and
|
||||||
clean up stale or unused data. Later, users should be able to create
|
clean up stale or unused data. Later, users should be able to create
|
||||||
periodic tasks, for example to read e-mails from an inbox.
|
periodic tasks, for example to read e-mails from an inbox or to be
|
||||||
|
notified of due items.
|
||||||
|
|
||||||
The problem is again, that it must work with multiple job executor
|
The problem is again, that it must work with multiple job executor
|
||||||
instances running at the same time. This is the same pattern as with
|
instances running at the same time. This is the same pattern as with
|
||||||
@ -38,14 +39,16 @@ For internal housekeeping tasks, it may suffice to reuse the existing
|
|||||||
`job` queue by adding more fields such that a job may be considered
|
`job` queue by adding more fields such that a job may be considered
|
||||||
periodic. But this conflates with what the `Scheduler` is doing now
|
periodic. But this conflates with what the `Scheduler` is doing now
|
||||||
(executing tasks as soon as possible while being bound to some
|
(executing tasks as soon as possible while being bound to some
|
||||||
resources) with a completely different subject.
|
resource limits) with a completely different subject.
|
||||||
|
|
||||||
There will be a new `PeriodicScheduler` that works on a new table in
|
There will be a new `PeriodicScheduler` that works on a new table in
|
||||||
the database that is representing periodic tasks. This table will
|
the database that is representing periodic tasks. This table will
|
||||||
share fields with the `job` table to be able to create `RJob`
|
share fields with the `job` table to be able to create `RJob` records.
|
||||||
instances. This new component is only taking care of periodically
|
This new component is only taking care of periodically submitting jobs
|
||||||
submitting jobs to the job queue such that the `Scheduler` will
|
to the job queue such that the `Scheduler` will eventually pick it up
|
||||||
eventually pick it up and run it.
|
and run it. If the tasks cannot run (for example due to resource
|
||||||
|
limitation), the periodic scheduler can't do nothing but wait and try
|
||||||
|
next time.
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
CREATE TABLE "periodic_task" (
|
CREATE TABLE "periodic_task" (
|
||||||
@ -65,11 +68,11 @@ CREATE TABLE "periodic_task" (
|
|||||||
);
|
);
|
||||||
```
|
```
|
||||||
|
|
||||||
Preparing for other features, periodic tasks will be created by users.
|
Preparing for other features, at some point periodic tasks will be
|
||||||
It should be possible to disable/enable them. The next 6 properties
|
created by users. It should be possible to disable/enable them. The
|
||||||
are needed to insert jobs into the `job` table. The `worker` field
|
next 6 properties are needed to insert jobs into the `job` table. The
|
||||||
(and `marked`) are used to mark a periodic job as "being worked on by
|
`worker` field (and `marked`) are used to mark a periodic job as
|
||||||
a job executor".
|
"being worked on by a job executor".
|
||||||
|
|
||||||
The `timer` is the schedule, which is a
|
The `timer` is the schedule, which is a
|
||||||
[systemd-like](https://man.cx/systemd.time#heading7) calendar event
|
[systemd-like](https://man.cx/systemd.time#heading7) calendar event
|
||||||
|
44
modules/microsite/docs/dev/adr/0013_archive_files.md
Normal file
44
modules/microsite/docs/dev/adr/0013_archive_files.md
Normal file
@ -0,0 +1,44 @@
|
|||||||
|
---
|
||||||
|
layout: docs
|
||||||
|
title: Archive Files
|
||||||
|
---
|
||||||
|
|
||||||
|
# {{ page.title }}
|
||||||
|
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
Docspell should have support for files that contain the actual files
|
||||||
|
that matter, like zip files and other such things. It should extract
|
||||||
|
its contents automatcially.
|
||||||
|
|
||||||
|
Since docspell should never drop or modify user data, the archive file
|
||||||
|
must be present in the database. And it must be possible to download
|
||||||
|
the file unmodified.
|
||||||
|
|
||||||
|
On the other hand, files in there need to be text analysed and
|
||||||
|
converted to pdf files.
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
There is currently a table `attachment_source` which holds references
|
||||||
|
to "original" files. These are the files as uploaded by the user,
|
||||||
|
before converted to pdf. Archive files add a subtlety to this: in case
|
||||||
|
of an archive, an `attachment_source` is the original (non-archive)
|
||||||
|
file inside an archive.
|
||||||
|
|
||||||
|
The archive file itself will be stored in a separate table `attachment_archive`.
|
||||||
|
|
||||||
|
Example: uploading a `files.zip` ZIP file containing `report.jpg`:
|
||||||
|
|
||||||
|
- `attachment_source`: report.jpg
|
||||||
|
- `attachment`: report.pdf
|
||||||
|
- `attachment_archive`: files.zip
|
||||||
|
|
||||||
|
Archive may contain other archives. Then the inner archives will not
|
||||||
|
be saved. The archive file is extracted recursively, until there is no
|
||||||
|
known archive file found.
|
||||||
|
|
||||||
|
## Initial Support
|
||||||
|
|
||||||
|
Initial support is implemented for ZIP and EML (e-mail files) files.
|
@ -25,6 +25,15 @@ compete on getting the next job from the queue. After a job finishes
|
|||||||
and no job is waiting in the queue, joex will sleep until notified
|
and no job is waiting in the queue, joex will sleep until notified
|
||||||
again. It will also periodically notify itself as a fallback.
|
again. It will also periodically notify itself as a fallback.
|
||||||
|
|
||||||
|
## Task vs Job
|
||||||
|
|
||||||
|
Just for the sake of this document, a task denotes the code that has
|
||||||
|
to be executed or the thing that has to be done. It emerges in a job,
|
||||||
|
once a task is submitted into the queue from where it will be picked
|
||||||
|
up and executed eventually. A job maintains a state and other things,
|
||||||
|
while a task is just code.
|
||||||
|
|
||||||
|
|
||||||
## Scheduler and Queue
|
## Scheduler and Queue
|
||||||
|
|
||||||
The scheduler is the part that runs and monitors the long running
|
The scheduler is the part that runs and monitors the long running
|
||||||
@ -115,6 +124,15 @@ reach a joex component. This periodic wakup is just to ensure that
|
|||||||
jobs are eventually run.
|
jobs are eventually run.
|
||||||
|
|
||||||
|
|
||||||
|
## Periodic Tasks
|
||||||
|
|
||||||
|
The job executor can execute tasks periodically. These tasks are
|
||||||
|
stored in the database such that they can be submitted into the job
|
||||||
|
queue. Multiple job executors can run at once, only one is ever doing
|
||||||
|
something with a task. So a periodic task is never submitted twice. It
|
||||||
|
is also not submitted, if a previous task has not finished yet.
|
||||||
|
|
||||||
|
|
||||||
## Starting on demand
|
## Starting on demand
|
||||||
|
|
||||||
The job executor and rest server can be started multiple times. This
|
The job executor and rest server can be started multiple times. This
|
||||||
@ -129,6 +147,7 @@ all have unique `app-id`s.
|
|||||||
Once the files have been processced you can stop the additional
|
Once the files have been processced you can stop the additional
|
||||||
executors.
|
executors.
|
||||||
|
|
||||||
|
|
||||||
## Shutting down
|
## Shutting down
|
||||||
|
|
||||||
If a job executor is sleeping and not executing any jobs, you can just
|
If a job executor is sleeping and not executing any jobs, you can just
|
||||||
|
@ -28,6 +28,9 @@ title: Features and Limitations
|
|||||||
- Images (jpg, png, tiff)
|
- Images (jpg, png, tiff)
|
||||||
- HTML
|
- HTML
|
||||||
- text/* (treated as Markdown)
|
- text/* (treated as Markdown)
|
||||||
|
- zip
|
||||||
|
- [eml](https://en.wikipedia.org/wiki/Email#Filename_extensions)
|
||||||
|
(e-mail files in plain text MIME)
|
||||||
- Tools:
|
- Tools:
|
||||||
- Watch a folder: watch folders for changes and send files to docspell
|
- Watch a folder: watch folders for changes and send files to docspell
|
||||||
- Firefox plugin: right click on a link and send the file to docspell
|
- Firefox plugin: right click on a link and send the file to docspell
|
||||||
|
Reference in New Issue
Block a user