mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-21 18:08:25 +00:00
Update documentation
This commit is contained in:
@ -15,7 +15,8 @@ immediately – as long as there are enough resource.
|
||||
What is missing, is a component that maintains periodic tasks. The
|
||||
reason for this is to have house keeping tasks that run regularily and
|
||||
clean up stale or unused data. Later, users should be able to create
|
||||
periodic tasks, for example to read e-mails from an inbox.
|
||||
periodic tasks, for example to read e-mails from an inbox or to be
|
||||
notified of due items.
|
||||
|
||||
The problem is again, that it must work with multiple job executor
|
||||
instances running at the same time. This is the same pattern as with
|
||||
@ -38,14 +39,16 @@ For internal housekeeping tasks, it may suffice to reuse the existing
|
||||
`job` queue by adding more fields such that a job may be considered
|
||||
periodic. But this conflates with what the `Scheduler` is doing now
|
||||
(executing tasks as soon as possible while being bound to some
|
||||
resources) with a completely different subject.
|
||||
resource limits) with a completely different subject.
|
||||
|
||||
There will be a new `PeriodicScheduler` that works on a new table in
|
||||
the database that is representing periodic tasks. This table will
|
||||
share fields with the `job` table to be able to create `RJob`
|
||||
instances. This new component is only taking care of periodically
|
||||
submitting jobs to the job queue such that the `Scheduler` will
|
||||
eventually pick it up and run it.
|
||||
share fields with the `job` table to be able to create `RJob` records.
|
||||
This new component is only taking care of periodically submitting jobs
|
||||
to the job queue such that the `Scheduler` will eventually pick it up
|
||||
and run it. If the tasks cannot run (for example due to resource
|
||||
limitation), the periodic scheduler can't do nothing but wait and try
|
||||
next time.
|
||||
|
||||
```sql
|
||||
CREATE TABLE "periodic_task" (
|
||||
@ -65,11 +68,11 @@ CREATE TABLE "periodic_task" (
|
||||
);
|
||||
```
|
||||
|
||||
Preparing for other features, periodic tasks will be created by users.
|
||||
It should be possible to disable/enable them. The next 6 properties
|
||||
are needed to insert jobs into the `job` table. The `worker` field
|
||||
(and `marked`) are used to mark a periodic job as "being worked on by
|
||||
a job executor".
|
||||
Preparing for other features, at some point periodic tasks will be
|
||||
created by users. It should be possible to disable/enable them. The
|
||||
next 6 properties are needed to insert jobs into the `job` table. The
|
||||
`worker` field (and `marked`) are used to mark a periodic job as
|
||||
"being worked on by a job executor".
|
||||
|
||||
The `timer` is the schedule, which is a
|
||||
[systemd-like](https://man.cx/systemd.time#heading7) calendar event
|
||||
|
44
modules/microsite/docs/dev/adr/0013_archive_files.md
Normal file
44
modules/microsite/docs/dev/adr/0013_archive_files.md
Normal file
@ -0,0 +1,44 @@
|
||||
---
|
||||
layout: docs
|
||||
title: Archive Files
|
||||
---
|
||||
|
||||
# {{ page.title }}
|
||||
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
Docspell should have support for files that contain the actual files
|
||||
that matter, like zip files and other such things. It should extract
|
||||
its contents automatcially.
|
||||
|
||||
Since docspell should never drop or modify user data, the archive file
|
||||
must be present in the database. And it must be possible to download
|
||||
the file unmodified.
|
||||
|
||||
On the other hand, files in there need to be text analysed and
|
||||
converted to pdf files.
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
There is currently a table `attachment_source` which holds references
|
||||
to "original" files. These are the files as uploaded by the user,
|
||||
before converted to pdf. Archive files add a subtlety to this: in case
|
||||
of an archive, an `attachment_source` is the original (non-archive)
|
||||
file inside an archive.
|
||||
|
||||
The archive file itself will be stored in a separate table `attachment_archive`.
|
||||
|
||||
Example: uploading a `files.zip` ZIP file containing `report.jpg`:
|
||||
|
||||
- `attachment_source`: report.jpg
|
||||
- `attachment`: report.pdf
|
||||
- `attachment_archive`: files.zip
|
||||
|
||||
Archive may contain other archives. Then the inner archives will not
|
||||
be saved. The archive file is extracted recursively, until there is no
|
||||
known archive file found.
|
||||
|
||||
## Initial Support
|
||||
|
||||
Initial support is implemented for ZIP and EML (e-mail files) files.
|
@ -25,6 +25,15 @@ compete on getting the next job from the queue. After a job finishes
|
||||
and no job is waiting in the queue, joex will sleep until notified
|
||||
again. It will also periodically notify itself as a fallback.
|
||||
|
||||
## Task vs Job
|
||||
|
||||
Just for the sake of this document, a task denotes the code that has
|
||||
to be executed or the thing that has to be done. It emerges in a job,
|
||||
once a task is submitted into the queue from where it will be picked
|
||||
up and executed eventually. A job maintains a state and other things,
|
||||
while a task is just code.
|
||||
|
||||
|
||||
## Scheduler and Queue
|
||||
|
||||
The scheduler is the part that runs and monitors the long running
|
||||
@ -115,6 +124,15 @@ reach a joex component. This periodic wakup is just to ensure that
|
||||
jobs are eventually run.
|
||||
|
||||
|
||||
## Periodic Tasks
|
||||
|
||||
The job executor can execute tasks periodically. These tasks are
|
||||
stored in the database such that they can be submitted into the job
|
||||
queue. Multiple job executors can run at once, only one is ever doing
|
||||
something with a task. So a periodic task is never submitted twice. It
|
||||
is also not submitted, if a previous task has not finished yet.
|
||||
|
||||
|
||||
## Starting on demand
|
||||
|
||||
The job executor and rest server can be started multiple times. This
|
||||
@ -129,6 +147,7 @@ all have unique `app-id`s.
|
||||
Once the files have been processced you can stop the additional
|
||||
executors.
|
||||
|
||||
|
||||
## Shutting down
|
||||
|
||||
If a job executor is sleeping and not executing any jobs, you can just
|
||||
|
@ -28,6 +28,9 @@ title: Features and Limitations
|
||||
- Images (jpg, png, tiff)
|
||||
- HTML
|
||||
- text/* (treated as Markdown)
|
||||
- zip
|
||||
- [eml](https://en.wikipedia.org/wiki/Email#Filename_extensions)
|
||||
(e-mail files in plain text MIME)
|
||||
- Tools:
|
||||
- Watch a folder: watch folders for changes and send files to docspell
|
||||
- Firefox plugin: right click on a link and send the file to docspell
|
||||
|
Reference in New Issue
Block a user