Initial website

2025-08-05 02:24:52 +00:00 · 2020-07-27 22:13:22 +02:00
parent dbd0f3ff97
commit f8c6f79b10
160 changed files with 8854 additions and 64 deletions
--- a/website/site/content/docs/joex/_index.md
+++ b/website/site/content/docs/joex/_index.md
@ -0,0 +1,178 @@
+++
+title = "Joex"
+description = "More information about the job executor component."
+weight = 90
+insert_anchor_links = "right"
+[extra]
+mktoc = true
+++
+
+Joex is short for *Job Executor* and it is the component managing long
+running tasks in docspell. One of these long running tasks is the file
+processing task.
+
+One joex component handles the processing of all files of all
+collectives/users. It requires much more resources than the rest
+server component. Therefore the number of jobs that can run in
+parallel is limited with respect to the hardware it is running on.
+
+For larger installations, it is probably better to run several joex
+components on different machines. That works out of the box, as long
+as all components point to the same database and use different
+`app-id`s (see [configuring
+docspell](@/docs/configure/_index.md#app-id)).
+
+When files are submitted to docspell, they are stored in the database
+and all known joex components are notified about new work. Then they
+compete on getting the next job from the queue. After a job finishes
+and no job is waiting in the queue, joex will sleep until notified
+again. It will also periodically notify itself as a fallback.
+
+## Task vs Job
+
+Just for the sake of this document, a task denotes the code that has
+to be executed or the thing that has to be done. It emerges in a job,
+once a task is submitted into the queue from where it will be picked
+up and executed eventually. A job maintains a state and other things,
+while a task is just code.
+
+
+## Scheduler and Queue
+
+The scheduler is the part that runs and monitors the long running
+jobs. It works together with the job queue, which defines what job to
+take next.
+
+To create a somewhat fair distribution among multiple collectives, a
+collective is first chosen in a simple round-robin way. Then a job
+from this collective is chosen by priority.
+
+There are only two priorities: low and high. A simple *counting
+scheme* determines if a low prio or high prio job is selected
+next. The default is `4, 1`, meaning to first select 4 high priority
+jobs and then 1 low priority job, then starting over. If no such job
+exists, its falls back to the other priority.
+
+The priority can be set on a *Source* (see
+[uploads](@/docs/webapp/uploading.md)). Uploading through the web
+application will always use priority *high*. The idea is that while
+logged in, jobs are more important that those submitted when not
+logged in.
+
+
+## Scheduler Config
+
+The relevant part of the config file regarding the scheduler is shown
+below with some explanations.
+
+```
+docspell.joex {
+  # other settings left out for brevity
+
+  scheduler {
+
+    # Number of processing allowed in parallel.
+    pool-size = 2
+
+    # A counting scheme determines the ratio of how high- and low-prio
+    # jobs are run. For example: 4,1 means run 4 high prio jobs, then
+    # 1 low prio and then start over.
+    counting-scheme = "4,1"
+
+    # How often a failed job should be retried until it enters failed
+    # state. If a job fails, it becomes "stuck" and will be retried
+    # after a delay.
+    retries = 5
+
+    # The delay until the next try is performed for a failed job. This
+    # delay is increased exponentially with the number of retries.
+    retry-delay = "1 minute"
+
+    # The queue size of log statements from a job.
+    log-buffer-size = 500
+
+    # If no job is left in the queue, the scheduler will wait until a
+    # notify is requested (using the REST interface). To also retry
+    # stuck jobs, it will notify itself periodically.
+    wakeup-period = "30 minutes"
+  }
+}
+```
+
+The `pool-size` setting determines how many jobs run in parallel. You
+need to play with this setting on your machine to find an optimal
+value.
+
+The `counting-scheme` determines for all collectives how to select
+between high and low priority jobs; as explained above. It is
+currently not possible to define that per collective.
+
+If a job fails, it will be set to *stuck* state and retried by the
+scheduler. The `retries` setting defines how many times a job is
+retried until it enters the final *failed* state. The scheduler waits
+some time until running the next try. This delay is given by
+`retry-delay`. This is the initial delay, the time until the first
+re-try (the second attempt). This time increases exponentially with
+the number of retries.
+
+The jobs will log about what they do, which is picked up and stored
+into the database asynchronously. The log events are buffered in a
+queue and another thread will consume this queue and store them in the
+database. The `log-buffer-size` determines the size of the queue.
+
+At last, there is a `wakeup-period` that determines at what interval
+the joex component notifies itself to look for new jobs. If jobs get
+stuck, and joex is not notified externally it could miss to
+retry. Also, since networks are not reliable, a notification may not
+reach a joex component. This periodic wakup is just to ensure that
+jobs are eventually run.
+
+
+## Periodic Tasks
+
+The job executor can execute tasks periodically. These tasks are
+stored in the database such that they can be submitted into the job
+queue. Multiple job executors can run at once, only one is ever doing
+something with a task. So a periodic task is never submitted twice. It
+is also not submitted, if a previous task has not finished yet.
+
+
+## Starting on demand
+
+The job executor and rest server can be started multiple times. This
+is especially useful for the job executor. For example, when
+submitting a lot of files in a short time, you can simply startup more
+job executors on other computers on your network. Maybe use your
+laptop to help with processing for a while.
+
+You have to make sure, that all connect to the same database, and that
+all have unique `app-id`s.
+
+Once the files have been processced you can stop the additional
+executors.
+
+
+## Shutting down
+
+If a job executor is sleeping and not executing any jobs, you can just
+quit using SIGTERM or `Ctrl-C` when running in a terminal. But if
+there are jobs currently executing, it is advisable to initiate a
+graceful shutdown. The job executor will then stop taking new jobs
+from the queue but it will wait until all running jobs have completed
+before shutting down.
+
+This can be done by sending a http POST request to the api of this job
+executor:
+
+```
+curl -XPOST "http://localhost:7878/api/v1/shutdownAndExit"
+```
+
+If joex receives this request it will immediately stop taking new jobs
+and it will quit when all running jobs are done.
+
+If a job executor gets terminated while there are running jobs, the
+jobs are still in the current state marked to be executed by this job
+executor. In order to fix this, start the job executor again. It will
+search all jobs that are marked with its id and put them back into
+waiting state. Then send a graceful shutdown request as shown above.