docspell/website/site/content/docs/joex/_index.md

179 lines
6.6 KiB
Markdown
Raw Normal View History

2020-07-27 20:13:22 +00:00
+++
title = "Joex"
description = "More information about the job executor component."
weight = 90
insert_anchor_links = "right"
[extra]
mktoc = true
+++
Joex is short for *Job Executor* and it is the component managing long
running tasks in docspell. One of these long running tasks is the file
processing task.
One joex component handles the processing of all files of all
collectives/users. It requires much more resources than the rest
server component. Therefore the number of jobs that can run in
parallel is limited with respect to the hardware it is running on.
For larger installations, it is probably better to run several joex
components on different machines. That works out of the box, as long
as all components point to the same database and use different
`app-id`s (see [configuring
docspell](@/docs/configure/_index.md#app-id)).
When files are submitted to docspell, they are stored in the database
and all known joex components are notified about new work. Then they
compete on getting the next job from the queue. After a job finishes
and no job is waiting in the queue, joex will sleep until notified
again. It will also periodically notify itself as a fallback.
## Task vs Job
Just for the sake of this document, a task denotes the code that has
to be executed or the thing that has to be done. It emerges in a job,
once a task is submitted into the queue from where it will be picked
up and executed eventually. A job maintains a state and other things,
while a task is just code.
## Scheduler and Queue
The scheduler is the part that runs and monitors the long running
jobs. It works together with the job queue, which defines what job to
take next.
To create a somewhat fair distribution among multiple collectives, a
collective is first chosen in a simple round-robin way. Then a job
from this collective is chosen by priority.
There are only two priorities: low and high. A simple *counting
scheme* determines if a low prio or high prio job is selected
next. The default is `4, 1`, meaning to first select 4 high priority
jobs and then 1 low priority job, then starting over. If no such job
exists, its falls back to the other priority.
The priority can be set on a *Source* (see
[uploads](@/docs/webapp/uploading.md)). Uploading through the web
application will always use priority *high*. The idea is that while
logged in, jobs are more important that those submitted when not
logged in.
## Scheduler Config
The relevant part of the config file regarding the scheduler is shown
below with some explanations.
```
docspell.joex {
# other settings left out for brevity
scheduler {
# Number of processing allowed in parallel.
pool-size = 2
# A counting scheme determines the ratio of how high- and low-prio
# jobs are run. For example: 4,1 means run 4 high prio jobs, then
# 1 low prio and then start over.
counting-scheme = "4,1"
# How often a failed job should be retried until it enters failed
# state. If a job fails, it becomes "stuck" and will be retried
# after a delay.
retries = 5
# The delay until the next try is performed for a failed job. This
# delay is increased exponentially with the number of retries.
retry-delay = "1 minute"
# The queue size of log statements from a job.
log-buffer-size = 500
# If no job is left in the queue, the scheduler will wait until a
# notify is requested (using the REST interface). To also retry
# stuck jobs, it will notify itself periodically.
wakeup-period = "30 minutes"
}
}
```
The `pool-size` setting determines how many jobs run in parallel. You
need to play with this setting on your machine to find an optimal
value.
The `counting-scheme` determines for all collectives how to select
between high and low priority jobs; as explained above. It is
currently not possible to define that per collective.
If a job fails, it will be set to *stuck* state and retried by the
scheduler. The `retries` setting defines how many times a job is
retried until it enters the final *failed* state. The scheduler waits
some time until running the next try. This delay is given by
`retry-delay`. This is the initial delay, the time until the first
re-try (the second attempt). This time increases exponentially with
the number of retries.
The jobs will log about what they do, which is picked up and stored
into the database asynchronously. The log events are buffered in a
queue and another thread will consume this queue and store them in the
database. The `log-buffer-size` determines the size of the queue.
At last, there is a `wakeup-period` that determines at what interval
the joex component notifies itself to look for new jobs. If jobs get
stuck, and joex is not notified externally it could miss to
retry. Also, since networks are not reliable, a notification may not
reach a joex component. This periodic wakup is just to ensure that
jobs are eventually run.
## Periodic Tasks
The job executor can execute tasks periodically. These tasks are
stored in the database such that they can be submitted into the job
queue. Multiple job executors can run at once, only one is ever doing
something with a task. So a periodic task is never submitted twice. It
is also not submitted, if a previous task has not finished yet.
## Starting on demand
The job executor and rest server can be started multiple times. This
is especially useful for the job executor. For example, when
submitting a lot of files in a short time, you can simply startup more
job executors on other computers on your network. Maybe use your
laptop to help with processing for a while.
You have to make sure, that all connect to the same database, and that
all have unique `app-id`s.
Once the files have been processced you can stop the additional
executors.
## Shutting down
If a job executor is sleeping and not executing any jobs, you can just
quit using SIGTERM or `Ctrl-C` when running in a terminal. But if
there are jobs currently executing, it is advisable to initiate a
graceful shutdown. The job executor will then stop taking new jobs
from the queue but it will wait until all running jobs have completed
before shutting down.
This can be done by sending a http POST request to the api of this job
executor:
```
curl -XPOST "http://localhost:7878/api/v1/shutdownAndExit"
```
If joex receives this request it will immediately stop taking new jobs
and it will quit when all running jobs are done.
If a job executor gets terminated while there are running jobs, the
jobs are still in the current state marked to be executed by this job
executor. In order to fix this, start the job executor again. It will
search all jobs that are marked with its id and put them back into
waiting state. Then send a graceful shutdown request as shown above.