diff --git a/modules/microsite/docs/dev/adr/0012_periodic_tasks.md b/modules/microsite/docs/dev/adr/0012_periodic_tasks.md index 5edd559c..ccb7ec32 100644 --- a/modules/microsite/docs/dev/adr/0012_periodic_tasks.md +++ b/modules/microsite/docs/dev/adr/0012_periodic_tasks.md @@ -15,7 +15,8 @@ immediately – as long as there are enough resource. What is missing, is a component that maintains periodic tasks. The reason for this is to have house keeping tasks that run regularily and clean up stale or unused data. Later, users should be able to create -periodic tasks, for example to read e-mails from an inbox. +periodic tasks, for example to read e-mails from an inbox or to be +notified of due items. The problem is again, that it must work with multiple job executor instances running at the same time. This is the same pattern as with @@ -38,14 +39,16 @@ For internal housekeeping tasks, it may suffice to reuse the existing `job` queue by adding more fields such that a job may be considered periodic. But this conflates with what the `Scheduler` is doing now (executing tasks as soon as possible while being bound to some -resources) with a completely different subject. +resource limits) with a completely different subject. There will be a new `PeriodicScheduler` that works on a new table in the database that is representing periodic tasks. This table will -share fields with the `job` table to be able to create `RJob` -instances. This new component is only taking care of periodically -submitting jobs to the job queue such that the `Scheduler` will -eventually pick it up and run it. +share fields with the `job` table to be able to create `RJob` records. +This new component is only taking care of periodically submitting jobs +to the job queue such that the `Scheduler` will eventually pick it up +and run it. If the tasks cannot run (for example due to resource +limitation), the periodic scheduler can't do nothing but wait and try +next time. ```sql CREATE TABLE "periodic_task" ( @@ -65,11 +68,11 @@ CREATE TABLE "periodic_task" ( ); ``` -Preparing for other features, periodic tasks will be created by users. -It should be possible to disable/enable them. The next 6 properties -are needed to insert jobs into the `job` table. The `worker` field -(and `marked`) are used to mark a periodic job as "being worked on by -a job executor". +Preparing for other features, at some point periodic tasks will be +created by users. It should be possible to disable/enable them. The +next 6 properties are needed to insert jobs into the `job` table. The +`worker` field (and `marked`) are used to mark a periodic job as +"being worked on by a job executor". The `timer` is the schedule, which is a [systemd-like](https://man.cx/systemd.time#heading7) calendar event diff --git a/modules/microsite/docs/dev/adr/0013_archive_files.md b/modules/microsite/docs/dev/adr/0013_archive_files.md new file mode 100644 index 00000000..3a959c16 --- /dev/null +++ b/modules/microsite/docs/dev/adr/0013_archive_files.md @@ -0,0 +1,44 @@ +--- +layout: docs +title: Archive Files +--- + +# {{ page.title }} + + +## Context and Problem Statement + +Docspell should have support for files that contain the actual files +that matter, like zip files and other such things. It should extract +its contents automatcially. + +Since docspell should never drop or modify user data, the archive file +must be present in the database. And it must be possible to download +the file unmodified. + +On the other hand, files in there need to be text analysed and +converted to pdf files. + +## Decision Outcome + +There is currently a table `attachment_source` which holds references +to "original" files. These are the files as uploaded by the user, +before converted to pdf. Archive files add a subtlety to this: in case +of an archive, an `attachment_source` is the original (non-archive) +file inside an archive. + +The archive file itself will be stored in a separate table `attachment_archive`. + +Example: uploading a `files.zip` ZIP file containing `report.jpg`: + +- `attachment_source`: report.jpg +- `attachment`: report.pdf +- `attachment_archive`: files.zip + +Archive may contain other archives. Then the inner archives will not +be saved. The archive file is extracted recursively, until there is no +known archive file found. + +## Initial Support + +Initial support is implemented for ZIP and EML (e-mail files) files. diff --git a/modules/microsite/docs/doc/joex.md b/modules/microsite/docs/doc/joex.md index 96bca7b0..23309e6e 100644 --- a/modules/microsite/docs/doc/joex.md +++ b/modules/microsite/docs/doc/joex.md @@ -25,6 +25,15 @@ compete on getting the next job from the queue. After a job finishes and no job is waiting in the queue, joex will sleep until notified again. It will also periodically notify itself as a fallback. +## Task vs Job + +Just for the sake of this document, a task denotes the code that has +to be executed or the thing that has to be done. It emerges in a job, +once a task is submitted into the queue from where it will be picked +up and executed eventually. A job maintains a state and other things, +while a task is just code. + + ## Scheduler and Queue The scheduler is the part that runs and monitors the long running @@ -115,6 +124,15 @@ reach a joex component. This periodic wakup is just to ensure that jobs are eventually run. +## Periodic Tasks + +The job executor can execute tasks periodically. These tasks are +stored in the database such that they can be submitted into the job +queue. Multiple job executors can run at once, only one is ever doing +something with a task. So a periodic task is never submitted twice. It +is also not submitted, if a previous task has not finished yet. + + ## Starting on demand The job executor and rest server can be started multiple times. This @@ -129,6 +147,7 @@ all have unique `app-id`s. Once the files have been processced you can stop the additional executors. + ## Shutting down If a job executor is sleeping and not executing any jobs, you can just diff --git a/modules/microsite/docs/features.md b/modules/microsite/docs/features.md index 8390db7f..6571be01 100644 --- a/modules/microsite/docs/features.md +++ b/modules/microsite/docs/features.md @@ -28,6 +28,9 @@ title: Features and Limitations - Images (jpg, png, tiff) - HTML - text/* (treated as Markdown) + - zip + - [eml](https://en.wikipedia.org/wiki/Email#Filename_extensions) + (e-mail files in plain text MIME) - Tools: - Watch a folder: watch folders for changes and send files to docspell - Firefox plugin: right click on a link and send the file to docspell