Update documentation

This commit is contained in:
Eike Kettner
2020-03-19 22:42:58 +01:00
parent 439aaee27b
commit d78bd4142c
4 changed files with 80 additions and 11 deletions

View File

@ -15,7 +15,8 @@ immediately as long as there are enough resource.
What is missing, is a component that maintains periodic tasks. The
reason for this is to have house keeping tasks that run regularily and
clean up stale or unused data. Later, users should be able to create
periodic tasks, for example to read e-mails from an inbox.
periodic tasks, for example to read e-mails from an inbox or to be
notified of due items.
The problem is again, that it must work with multiple job executor
instances running at the same time. This is the same pattern as with
@ -38,14 +39,16 @@ For internal housekeeping tasks, it may suffice to reuse the existing
`job` queue by adding more fields such that a job may be considered
periodic. But this conflates with what the `Scheduler` is doing now
(executing tasks as soon as possible while being bound to some
resources) with a completely different subject.
resource limits) with a completely different subject.
There will be a new `PeriodicScheduler` that works on a new table in
the database that is representing periodic tasks. This table will
share fields with the `job` table to be able to create `RJob`
instances. This new component is only taking care of periodically
submitting jobs to the job queue such that the `Scheduler` will
eventually pick it up and run it.
share fields with the `job` table to be able to create `RJob` records.
This new component is only taking care of periodically submitting jobs
to the job queue such that the `Scheduler` will eventually pick it up
and run it. If the tasks cannot run (for example due to resource
limitation), the periodic scheduler can't do nothing but wait and try
next time.
```sql
CREATE TABLE "periodic_task" (
@ -65,11 +68,11 @@ CREATE TABLE "periodic_task" (
);
```
Preparing for other features, periodic tasks will be created by users.
It should be possible to disable/enable them. The next 6 properties
are needed to insert jobs into the `job` table. The `worker` field
(and `marked`) are used to mark a periodic job as "being worked on by
a job executor".
Preparing for other features, at some point periodic tasks will be
created by users. It should be possible to disable/enable them. The
next 6 properties are needed to insert jobs into the `job` table. The
`worker` field (and `marked`) are used to mark a periodic job as
"being worked on by a job executor".
The `timer` is the schedule, which is a
[systemd-like](https://man.cx/systemd.time#heading7) calendar event

View File

@ -0,0 +1,44 @@
---
layout: docs
title: Archive Files
---
# {{ page.title }}
## Context and Problem Statement
Docspell should have support for files that contain the actual files
that matter, like zip files and other such things. It should extract
its contents automatcially.
Since docspell should never drop or modify user data, the archive file
must be present in the database. And it must be possible to download
the file unmodified.
On the other hand, files in there need to be text analysed and
converted to pdf files.
## Decision Outcome
There is currently a table `attachment_source` which holds references
to "original" files. These are the files as uploaded by the user,
before converted to pdf. Archive files add a subtlety to this: in case
of an archive, an `attachment_source` is the original (non-archive)
file inside an archive.
The archive file itself will be stored in a separate table `attachment_archive`.
Example: uploading a `files.zip` ZIP file containing `report.jpg`:
- `attachment_source`: report.jpg
- `attachment`: report.pdf
- `attachment_archive`: files.zip
Archive may contain other archives. Then the inner archives will not
be saved. The archive file is extracted recursively, until there is no
known archive file found.
## Initial Support
Initial support is implemented for ZIP and EML (e-mail files) files.

View File

@ -25,6 +25,15 @@ compete on getting the next job from the queue. After a job finishes
and no job is waiting in the queue, joex will sleep until notified
again. It will also periodically notify itself as a fallback.
## Task vs Job
Just for the sake of this document, a task denotes the code that has
to be executed or the thing that has to be done. It emerges in a job,
once a task is submitted into the queue from where it will be picked
up and executed eventually. A job maintains a state and other things,
while a task is just code.
## Scheduler and Queue
The scheduler is the part that runs and monitors the long running
@ -115,6 +124,15 @@ reach a joex component. This periodic wakup is just to ensure that
jobs are eventually run.
## Periodic Tasks
The job executor can execute tasks periodically. These tasks are
stored in the database such that they can be submitted into the job
queue. Multiple job executors can run at once, only one is ever doing
something with a task. So a periodic task is never submitted twice. It
is also not submitted, if a previous task has not finished yet.
## Starting on demand
The job executor and rest server can be started multiple times. This
@ -129,6 +147,7 @@ all have unique `app-id`s.
Once the files have been processced you can stop the additional
executors.
## Shutting down
If a job executor is sleeping and not executing any jobs, you can just

View File

@ -28,6 +28,9 @@ title: Features and Limitations
- Images (jpg, png, tiff)
- HTML
- text/* (treated as Markdown)
- zip
- [eml](https://en.wikipedia.org/wiki/Email#Filename_extensions)
(e-mail files in plain text MIME)
- Tools:
- Watch a folder: watch folders for changes and send files to docspell
- Firefox plugin: right click on a link and send the file to docspell