Upgrade microsite

2025-11-04 12:30:12 +00:00 · 2019-12-29 23:37:32 +01:00
parent 2001cca88b
commit 57e274e2b0
42 changed files with 599 additions and 70 deletions
--- a/modules/microsite/docs/doc/configure.md
+++ b/modules/microsite/docs/doc/configure.md
@@ -0,0 +1,261 @@
+---
+layout: docs
+title: Configuring
+---
+
+# {{ page.title }}
+
+Docspell's executable can take one argument – a configuration file. If
+that is not given, the defaults are used. The config file overrides
+default values, so only values that differ from the defaults are
+necessary.
+
+This applies to the restserver and the joex as well.
+
+## Important Config Options
+
+The configuration of both components uses separate namespaces. The
+configuration for the REST server is below `docspell.server`, while
+the one for joex is below `docspell.joex`.
+
+### JDBC
+
+This configures the connection to the database. This has to be
+specified for the rest server and joex. By default, a H2 database in
+the current `/tmp` directory is configured.
+
+The config looks like this (both components):
+
+```
+docspell.joex.jdbc {
+  url = ...
+  user = ...
+  password = ...
+}
+
+docspell.server.backend.jdbc {
+  url = ...
+  user = ...
+  password = ...
+}
+```
+
+The `url` is the connection to the database. It must start with
+`jdbc`, followed by name of the database. The rest is specific to the
+database used: it is either a path to a file for H2 or a host/database
+url for MariaDB and PostgreSQL.
+
+When using H2, the user is `sa`, the password can be empty and the url
+must include these options:
+
+```
+;MODE=PostgreSQL;DATABASE_TO_LOWER=TRUE;AUTO_SERVER=TRUE
+```
+
+#### Examples
+
+PostgreSQL:
+```
+url = "jdbc:postgresql://localhost:5432/docspelldb"
+```
+
+MariaDB:
+```
+url = "jdbc:mariadb://localhost:3306/docspelldb"
+```
+
+H2
+```
+url = "jdbc:h2:///path/to/a/file.db;MODE=PostgreSQL;DATABASE_TO_LOWER=TRUE;AUTO_SERVER=TRUE"
+```
+
+### Bind
+
+The host and port the http server binds to. This applies to both
+components. The joex component also exposes a small REST api to
+inspect its state and notify the scheduler.
+
+```
+docspell.server.bind {
+  address = localhost
+  port = 7880
+}
+docspell.joex.bind {
+  address = localhost
+  port = 7878
+}
+```
+
+By default, it binds to `localhost` and some predefined port. This
+must be changed, if components are on different machines.
+
+### baseurl
+
+The base url is an important setting that defines the http URL where
+the corresponding component can be reached. It applies to both
+components. For a joex component, the url must be resolvable from a
+REST server component. The REST server also uses this url to create
+absolute urls and to configure the authenication cookie.
+
+By default it is build using the information from the `bind` setting.
+
+
+```
+docspell.server.baseurl = ...
+docspell.joex.baseurl = ...
+```
+
+#### Examples
+
+```
+docspell.server.baseurl = "https://docspell.example.com"
+docspell.joex.baseurl = "http://192.168.101.10"
+```
+
+
+### app-id
+
+The `app-id` is the identifier of the corresponding instance. It *must
+be unique* for all instances. By default the REST server uses `rest1`
+and joex `joex1`. It is recommended to overwrite this setting to have
+an explicit and stable identifier.
+
+```
+docspell.server.app-id = "rest1"
+docspell.joex.app-id = "joex1"
+```
+
+### registration options
+
+This defines if and how new users can create accounts. There are 3
+options:
+
+- *closed* no new user can sign up
+- *open* new users can sign up
+- *invite* new users can sign up but require an invitation key
+
+This applies only to the REST sevrer component.
+
+```
+docspell.server.signup {
+  mode = "open"
+
+  # If mode == 'invite', a password must be provided to generate
+  # invitation keys. It must not be empty.
+  new-invite-password = ""
+
+  # If mode == 'invite', this is the period an invitation token is
+  # considered valid.
+  invite-time = "3 days"
+}
+```
+
+The mode `invite` is intended to open the application only to some
+users. The admin can create these invitation keys and distribute them
+to the desired people. For this, the `new-invite-password` must be
+given. The idea is that only the person who installs docspell knows
+this. If it is not set, then invitation won't work. New invitation
+keys can be generated from within the web application or via REST
+calls (using `curl`, for example).
+
+```
+curl -X POST -d '{"password":"blabla"}' "http://localhost:7880/api/v1/open/signup/newinvite"
+```
+
+### Authentication
+
+Authentication works in two ways:
+
+- with an account-name / password pair
+- with an authentication token
+
+The initial authentication must occur with an accountname/password
+pair. This will generate an authentication token which is valid for a
+some time. Subsequent calls to secured routes can use this token. The
+token can be given as a normal http header or via a cookie header.
+
+These settings apply only to the REST server.
+
+```
+docspell.server.auth {
+  server-secret = "hex:caffee" # or "b64:Y2FmZmVlCg=="
+  session-valid = "5 minutes"
+}
+```
+
+The `server-secret` is used to sign the token. If multiple REST
+servers are deployed, all must share the same server secret. Otherwise
+tokens from one instance are not valid on another instance. The secret
+can be given as Base64 encoded string or in hex form. Use the prefix
+`hex:` and `b64:`, respectively.
+
+The `session-valid` deterimens how long a token is valid. This can be
+just some minutes, the web application obtains new ones
+periodically. So a short time is recommended.
+
+
+## File Format
+
+The format of the configuration files can be
+[HOCON](https://github.com/lightbend/config/blob/master/HOCON.md#hocon-human-optimized-config-object-notation),
+JSON or whatever the used [config
+library](https://github.com/lightbend/config) understands. The default
+values below are in HOCON format, which is recommended, since it
+allows comments and has some [advanced
+features](https://github.com/lightbend/config/blob/master/README.md#features-of-hocon). Please
+refer to their documentation for more on this.
+
+Here are the default configurations.
+
+
+## Default Config
+
+### Rest Server
+
+```
+{% include server.conf %}
+```
+
+### Joex
+
+```
+{% include joex.conf %}
+```
+
+## Logging
+
+By default, docspell logs to stdout. This works well, when managed by
+systemd or other inits. Logging is done by
+[logback](https://logback.qos.ch/). Please refer to its documentation
+for how to configure logging.
+
+If you created your logback config file, it can be added as argument
+to the executable using this syntax:
+
+```
+/path/to/docspell -Dlogback.configurationFile=/path/to/your/logging-config-file
+```
+
+To get started, the default config looks like this:
+
+``` xml
+<configuration>
+  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
+    <withJansi>true</withJansi>
+
+    <encoder>
+      <pattern>[%thread] %highlight(%-5level) %cyan(%logger{15}) - %msg %n</pattern>
+    </encoder>
+  </appender>
+
+  <logger name="docspell" level="debug" />
+  <root level="INFO">
+    <appender-ref ref="STDOUT" />
+  </root>
+</configuration>
+```
+
+The `<root level="INFO">` means, that only log statements with level
+"INFO" will be printed. But the `<logger name="docspell"
+level="debug">` above says, that for loggers with name "docspell"
+statements with level "DEBUG" will be printed, too.
--- a/modules/microsite/docs/doc/curate.md
+++ b/modules/microsite/docs/doc/curate.md
@@ -0,0 +1,77 @@
+---
+layout: docs
+title: Find and Review
+---
+
+# {{page.title}}
+
+Curating the items meta data helps finding them later. This page
+describes how you can quickly go through those items and correct or
+amend with existing data.
+
+## Select New items
+
+After files have been uploaded and the job executor created the
+corresponding items, they will show up on the main page. All items,
+the job executor has created are initially marked as *New*. The option
+*only New* in the left search menu can be used to select only new
+items:
+
+<div class="thumbnail">
+  <img src="../img/docspell-curate-1.jpg">
+</div>
+
+
+## Check selected items
+
+Then you can go through all new items and check their metadata: Click
+on the first item to open the detail view. This shows the documents
+and the meta data in the header.
+
+<div class="thumbnail">
+  <img src="../img/docspell-curate-2.jpg">
+</div>
+
+
+## Modify if necessary
+
+To change something, click the *Edit* button in the menu above the
+document view. This will open a form next to your documents. You can
+compare the data with the documents and change as you like. Since the
+item status is *New*, you'll see the suggestions docspell found during
+processing. If there were multiple candidates, you can select another
+one by clicking its name in the suggestion list.
+
+<div class="thumbnail">
+  <img src="../img/docspell-curate-3.jpg">
+</div>
+
+
+When you change something in the form, it is immediatly applied. Only
+when changing text fields, a click on the *Save* symbol next to the
+field is required.
+
+
+## Confirm
+
+If everything looks good, click the *Confirm* button to confirm the
+current data. The *New* status goes away and also the suggestions are
+hidden in this state. You can always go back by clicking the
+*Unconfirm* button.
+
+<div class="thumbnail">
+  <img src="../img/docspell-curate-5.jpg">
+</div>
+
+
+## Proceed with next item
+
+To look at the next item in the search results, click the *Next*
+button in the menu (next to the *Edit* button). Clicking next, will
+keep the current view, so you can continue checking the data. If you
+are on the last item, the view switches to the listing view when
+clicking *Next*.
+
+<div class="thumbnail">
+  <img src="../img/docspell-curate-6.jpg">
+</div>
--- a/modules/microsite/docs/doc/install.md
+++ b/modules/microsite/docs/doc/install.md
@@ -0,0 +1,218 @@
+---
+layout: docs
+title: Installation
+---
+
+# {{ page.title }}
+
+This page contains detailed installation instructions. For a quick
+start, refer to [this page](../getit.html).
+
+Docspell has been developed and tested on a GNU/Linux system. It may
+run on Windows and MacOS machines, too (ghostscript and tesseract are
+available on these systems). But I've never tried.
+
+Docspell consists of two components that are started in separate
+processes:
+
+1. *REST Server* This is the main application, providing the REST Api
+   and the web application.
+2. *Joex* (job executor) This is the component that does the document
+   processing.
+
+They can run on multiple machines. All REST server and Joex instances
+should be on the same network. It is not strictly required that they
+can reach each other, but the components can then notify themselves
+about new or done work.
+
+While this is possible, the simple setup is to start both components
+once on the same machine.
+
+The [download page](https://github.com/eikek/docspell/releases)
+provides pre-compiled packages and the [development page](dev.html)
+contains build instructions.
+
+
+## Prerequisites
+
+The two components have one prerequisite in common: they both require
+Java to run. While this is the only requirement for the *REST server*,
+the *Joex* components requires some more external programs.
+
+### Java
+
+Very often, Java is already installed. You can check this by opening a
+terminal and typing `java -version`. Otherwise install Java using your
+package manager or see [this site](https://adoptopenjdk.net/) for
+other options.
+
+It is enough to install the JRE. The JDK is required, if you want to
+build docspell from source.
+
+Docspell has been tested with Java version 1.8 (or sometimes referred
+to as JRE 8 and JDK 8, respectively). The pre-build packages are also
+build using JDK 8. But a later version of Java should work as well.
+
+The next tools are only required on machines running the *Joex*
+component.
+
+### External Tools for Joex
+
+- [Ghostscript](http://pages.cs.wisc.edu/~ghost/) (the `gs` command)
+  is used to extract/convert PDF files into images that are then fed
+  to ocr. It is available on most GNU/Linux distributions.
+- [Unpaper](https://github.com/Flameeyes/unpaper) is a program that
+  pre-processes images to yield better results when doing ocr. If this
+  is not installed, docspell tries without it. However, it is
+  recommended to install, because it [improves text
+  extraction](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality)
+  (at the expense of a longer runtime).
+- [Tesseract](https://github.com/tesseract-ocr/tesseract) is the tool
+  doing the OCR (converts images into text). It is a widely used open
+  source OCR engine. Tesseract 3 and 4 should work with docspell; you
+  can adopt the command line in the configuration file, if necessary.
+
+
+### Example Debian
+
+On Debian this should install all joex requirements:
+
+``` bash
+sudo apt-get install ghostscript tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng unpaper
+```
+
+## Database
+
+Both components must have access to a SQL database. Docspell has
+support these databases:
+
+- PostreSQL
+- MariaDB
+- H2
+
+The H2 database is an interesting option for personal and mid-size
+setups, as it requires no additional work. It is integrated into
+docspell and works really well. It is also configured as the default
+database.
+
+For large installations, PostgreSQL or MariaDB is recommended. Create
+a database and a user with enough privileges (read, write, create
+table) to that database.
+
+When using H2, make sure that all components access the same database
+– the jdbc url must point to the same file. Then, it is important to
+add the options
+`;MODE=PostgreSQL;DATABASE_TO_LOWER=TRUE;AUTO_SERVER=TRUE` at the end
+of the url. See the [default config](configure.html) for an example.
+
+
+## Installing from ZIP files
+
+After extracting the zip files, you'll find a start script in the
+`bin/` folder.
+
+
+## Installing from DEB packages
+
+The DEB packages can be installed on Debian, or Debian based Distros:
+
+``` bash
+$ sudo dpkg -i docspell*.deb
+```
+
+Then the start scripts are in your `$PATH`. Run `docspell-restserver`
+or `docspell-joex` from a terminal window.
+
+The packages come with a systemd unit file that will be installed to
+autostart the services.
+
+
+## Running
+
+Run the start script (in the corresponding `bin/` directory when using
+the zip files):
+
+```
+$ ./docspell-restserver*/bin/docspell-restserver
+$ ./docspell-joex*/bin/docspell-joex
+```
+
+This will startup both components using the default configuration. The
+configuration should be adopted to your needs. For example, the
+database connection is configured to use a H2 database in the `/tmp`
+directory. Please refer to the [configuration page](configure.html)
+for how to create a custom config file. Once you have your config
+file, simply pass it as argument to the command:
+
+```
+$ ./docspell-restserver*/bin/docspell-restserver /path/to/server-config.conf
+$ ./docspell-joex*/bin/docspell-joex /path/to/joex-config.conf
+```
+
+After starting the rest server, you can reach the web application at
+path `/app/index.html`, so using default values it would be
+`http://localhost:7880/app/index.html`.
+
+You should be able to create a new account and sign in. Check the
+[configuration page](configure.html) to further customize docspell.
+
+
+### Options
+
+The start scripts support some options to configure the JVM. One often
+used setting is the maximum heap size of the JVM. By default, java
+determines it based on properties of the current machine. You can
+specify it by given java startup options to the command:
+
+```
+$ ./docspell-restserver*/bin/docspell-restserver -J-Xmx1G -- /path/to/server-config.conf
+```
+
+This would limit the maximum heap to 1GB. The double slash separates
+internal options and the arguments to the program. Another frequently
+used option is to change the default temp directory. Usually it is
+`/tmp`, but it may be desired to have a dedicated temp directory,
+which can be configured:
+
+```
+$ ./docspell-restserver*/bin/docspell-restserver -J-Xmx1G -Djava.io.tmpdir=/path/to/othertemp -- /path/to/server-config.conf
+```
+
+The command:
+
+```
+$ ./docspell-restserver*/bin/docspell-restserver -h
+```
+
+gives an overview of supported options.
+
+
+## Raspberry Pi, and similiar
+
+Both component can run next to each other on a raspberry pi or
+similiar device.
+
+
+### REST Server
+
+The REST server component runs very well on the Raspberry Pi and
+similiar devices. It doesn't require much resources, because the heavy
+work is done by the joex components.
+
+
+### Joex
+
+Running the joex component on the Raspberry Pi is possible, but will
+result in long processing times. Tested on a RPi model 3 (4 cores, 1G
+RAM) processing a PDF (scanned with 300dpi) with two pages took
+9:52. You can speed it up considerably by uninstalling the `unpaper`
+command, because this step takes quite long. This, of course, reduces
+the quality of OCR. But without `unpaper` the same sample pdf was then
+processed in 1:24, a speedup of 8 minutes.
+
+You should limit the joex pool size to 1 and, depending on your model
+and the amount of RAM, set a heap size of at least 500M
+(`-J-Xmx500M`).
+
+For personal setups, when you don't need the processing results asap,
+this can work well enough.
--- a/modules/microsite/docs/doc/joex.md
+++ b/modules/microsite/docs/doc/joex.md
@@ -0,0 +1,155 @@
+---
+layout: docs
+title: Joex
+---
+
+# {{ page.title }}
+
+Joex is short for *Job Executor* and it is the component managing long
+running tasks in docspell. One of these long running tasks is the file
+processing task.
+
+One joex component handles the processing of all files of all
+collectives/users. It requires much more resources than the rest
+server component. Therefore the number of jobs that can run in
+parallel is limited with respect to the hardware it is running on.
+
+For larger installations, it is probably better to run several joex
+components on different machines. That works out of the box, as long
+as all components point to the same database and use different
+`app-id`s (see [configuring docspell](./configure.html)).
+
+When files are submitted to docspell, they are stored in the database
+and all known joex components are notified about new work. Then they
+compete on getting the next job from the queue. After a job finishes
+and no job is waiting in the queue, joex will sleep until notified
+again. It will also periodically notify itself as a fallback.
+
+## Scheduler and Queue
+
+The scheduler is the part that runs and monitors the long running
+jobs. It works together with the job queue, which defines what job to
+take next.
+
+To create a somewhat fair distribution among multiple collectives, a
+collective is first chosen in a simple round-robin way. Then a job
+from this collective is chosen by priority.
+
+There are only two priorities: low and high. A simple *counting
+scheme* determines if a low prio or high prio job is selected
+next. The default is `4, 1`, meaning to first select 4 high priority
+jobs and then 1 low priority job, then starting over. If no such job
+exists, its falls back to the other priority.
+
+The priority can be set on a *Source* (see
+[uploads](uploading.html)). Uploading through the web application will
+always use priority *high*. The idea is that while logged in, jobs are
+more important that those submitted when not logged in.
+
+
+## Scheduler Config
+
+The relevant part of the config file regarding the scheduler is shown
+below with some explanations.
+
+```
+docspell.joex {
+  # other settings left out for brevity
+
+  scheduler {
+
+    # Number of processing allowed in parallel.
+    pool-size = 2
+
+    # A counting scheme determines the ratio of how high- and low-prio
+    # jobs are run. For example: 4,1 means run 4 high prio jobs, then
+    # 1 low prio and then start over.
+    counting-scheme = "4,1"
+
+    # How often a failed job should be retried until it enters failed
+    # state. If a job fails, it becomes "stuck" and will be retried
+    # after a delay.
+    retries = 5
+
+    # The delay until the next try is performed for a failed job. This
+    # delay is increased exponentially with the number of retries.
+    retry-delay = "1 minute"
+
+    # The queue size of log statements from a job.
+    log-buffer-size = 500
+
+    # If no job is left in the queue, the scheduler will wait until a
+    # notify is requested (using the REST interface). To also retry
+    # stuck jobs, it will notify itself periodically.
+    wakeup-period = "30 minutes"
+  }
+}
+```
+
+The `pool-size` setting deterimens how many jobs run in parallel. You
+need to play with this setting on your machine to find an optimal
+value.
+
+The `counting-scheme` determines for all collectives how to select
+between high and low priority jobs; as explained above. It is
+currently not possible to define that per collective.
+
+If a job fails, it will be set to *stuck* state and retried by the
+scheduler. The `retries` setting defines how many times a job is
+retried until it enters the final *failed* state. The scheduler waits
+some time until running the next try. This delay is given by
+`retry-delay`. This is the initial delay, the time until the first
+re-try (the second attempt). This time increases exponentially with
+the number of retries.
+
+The jobs will log about what they do, which is picked up and stored
+into the database asynchronously. The log events are buffered in a
+queue and another thread will consume this queue and store them in the
+database. The `log-buffer-size` determines the size of the queue.
+
+At last, there is a `wakeup-period` that determines at what interval
+the joex component notifies itself to look for new jobs. If jobs get
+stuck, and joex is not notified externally it could miss to
+retry. Also, since networks are not reliable, a notification may not
+reach a joex component. This periodic wakup is just to ensure that
+jobs are eventually run.
+
+
+## Starting on demand
+
+The job executor and rest server can be started multiple times. This
+is especially useful for the job executor. For example, when
+submitting a lot of files in a short time, you can simply startup more
+job executors on other computers on your network. Maybe use your
+laptop to help with processing for a while.
+
+You have to make sure, that all connect to the same database, and that
+all have unique `app-id`s.
+
+Once the files have been processced you can stop the additional
+executors.
+
+## Shutting down
+
+If a job executor is sleeping and not executing any jobs, you can just
+quit using SIGTERM or `Ctrl-C` when running in a terminal. But if
+there are jobs currently executing, it is advisable to initiate a
+graceful shutdown. The job executor will then stop taking new jobs
+from the queue but it will wait until all running jobs have completed
+before shutting down.
+
+This can be done by sending a http POST request to the api of this job
+executor:
+
+```
+curl -XPOST "http://localhost:7878/api/v1/shutdownAndExit"
+```
+
+If joex receives this request it will immediately stop taking new jobs
+and it will quit when all running jobs are done.
+
+If a job executor gets terminated while there are running jobs, the
+jobs are still in the current state marked to be executed by this job
+executor. In order to fix this, start the job executor again. It will
+search all jobs that are marked with its id and put them back into
+waiting state. Then send a graceful shutdown request as shown above.
--- a/modules/microsite/docs/doc/metadata.md
+++ b/modules/microsite/docs/doc/metadata.md
@@ -0,0 +1,87 @@
+---
+layout: docs
+title: Adding Meta Data
+---
+
+# {{ page.title }}
+
+## Meta Data
+
+The processing can be controlled implicitely by the provided meta
+data. The *Meta Data* page allows to manage this meta data. You can
+create the following:
+
+- Tags
+- Organizations
+- Persons
+- Equipments
+
+### Tags
+
+Items can be tagged with multiple custom tags (aka labels). This
+allows to describe many different workflows people may have with their
+documents.
+
+A tag can have a *category*. This is meant to group tags together. For
+example, you may want to have a tag category *doctype* that is
+comprised of tags like *bill*, *contract*, *receipt* and so on. Or for
+workflows, a tag category *state* may exist that includes tags like
+*Todo* or *Waiting*. Or you can tag items with user names to provide
+"assignment" semantics. Docspell doesn't propose any workflow, but it
+can help to implement some.
+
+The tags are *not* taken into account when processing. Docspell will
+not automatically associate tags to your items. The tags are only
+meant to be used manually.
+
+
+### Organization and Person
+
+The organization entity represents an non-personal (organization or
+company) correspondent of an item. Docspell will choose one or more
+organizations when processing documents and associate the "best" match
+with your item.
+
+The person entitiy can appear in two roles: It may be a correspondent
+or the person an item is about. So a person is either a correspondent
+or a concerning person. Docspell can not know which person is which,
+therefore you need to tell this by checking the box "Use for
+concerning person suggestion only". If this is checked, docspell will
+use this person only to suggest a concerning person. Otherwise the
+person is used only for correspondent suggestions.
+
+Document processing uses the following properties:
+
+- name
+- websites
+- e-mails
+
+The website an e-mails can be added as contact information. If these
+three are present, you should get good matches from docspell. All
+other fields of an organization and person are not used during
+document processing. They might be useful when using this as a real
+address book.
+
+
+### Equipment
+
+The equipment entity is almost like a tag. In fact, it could be
+replaced by a tag with a specific known category. The difference is
+that docspell will try to find a match and associate it with your
+item. The equipment represents non-personal things that an item is
+about. Examples are: bills or insurances for *cars*, contracts for
+*houses* or *flats*.
+
+Equipments don't have contact information, so the only property that
+is used to find matches during document processing is its name.
+
+
+## Document Language
+
+An important setting is the language of your documents. This helps OCR
+and text analysis. You can select between English and German
+currently.
+
+Go to the *Collective Settings* page and click *Document
+Language*. This will set the lanugage for all your documents. It is
+not (yet) possible to specify it when uploading.
--- a/modules/microsite/docs/doc/processing.md
+++ b/modules/microsite/docs/doc/processing.md
@@ -0,0 +1,40 @@
+---
+layout: docs
+title: Processing Queue
+---
+
+# {{ page.title }}
+
+
+The page *Processing Queue* shows the current state of document
+processing for your uploads.
+
+At the top of the page a list of running jobs is shown. Below that,
+the left column shows jobs that wait to be picked up by the job
+executor. On the right are finished jobs. The number of finished jobs
+is cut to some maximum and is also restricted by a date range. The
+page refreshes itself automatically to show the progress.
+
+Example screenshot:
+
+<div class="thumbnail">
+  <img src="../img/processing-queue.jpg">
+</div>
+
+You can cancel running jobs or remove waiting ones from the queue. If
+you click on the small file symbol on finished jobs, you can inspect
+its log messages again. A running job displays the job executor id
+that executes the job.
+
+Currently the job queue executes just the document processing tasks,
+but it may be used for other long running tasks in the future.
+
+Since job executors are shared among all collectives, it may happen
+that a job is some time waiting until it is picked up by a job
+executor. You can always start more job executors to help out.
+
+If a job fails, it is retried after some time. Only if it fails too
+often (can be configured), it then is finished with *failed* state. If
+processing finally fails, the item is still created, just without
+suggestions. But if processing is cancelled by the user, the item is
+not created.
--- a/modules/microsite/docs/doc/tools.md
+++ b/modules/microsite/docs/doc/tools.md
@@ -0,0 +1,187 @@
+---
+layout: docs
+title: Tools
+---
+
+# {{ page.title }}
+
+The `tools/` folder contains some scripts and other resources intented
+for integrating docspell.
+
+## consumedir
+
+The `consumerdir.sh` is a bash script that works in two modes:
+
+- Go through all files in given directories (non recursively) and sent
+  each to docspell.
+- Watch one or more directories for new files and upload them to
+  docspell.
+
+It can watch or go through one or more directories. Files can be
+uploaded to multiple urls.
+
+Run the script with the `-h` option, to see a short help text. The
+help text will also show the values for any given option.
+
+The script requires `curl` for uploading. It requires the
+`inotifywait` command if directories should be watched for new
+files. If the `-m` option is used, the script will skip duplicate
+files. For this the `sha256sum` command is required.
+
+Example for watching two directories:
+
+``` bash
+./tools/consumedir.sh --path ~/Downloads --path ~/pdfs -m /var/run/consumedir -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
+```
+
+The script by default watches the given directories. If the `-o`
+option is used, it will instead go through these directories and
+upload all pdf files in there.
+
+Example for uploading all immediatly (the same as above only with `-o`
+added):
+
+``` bash
+./tools/consumedir.sh -o --path ~/Downloads --path ~/pdfs/ -m /var/run/consumedir -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
+```
+
+
+### Systemd
+
+The script can be used with systemd to run as a service. This is an
+example unit file:
+
+```
+[Unit]
+After=networking.target
+Description=Docspell Consumedir
+
+[Service]
+Environment="PATH=/set/a/path"
+
+ExecStartPre=mkdir -p /var/run/consumedir && chown -R someuser /var/run/consumedir
+ExecStart=/bin/su -s /bin/bash someuser -c "consumedir.sh --path '/a/path/' -m '/var/run/consumedir' 'http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ'"
+```
+
+This unit file is just an example, it needs some fiddling. It assumes
+an existing user `someuser` that is used to run this service. The url
+`http://localhost:7880/api/v1/open/upload/...` is an anonymous upload
+url as described [here](./uploading.html).
+
+
+## ds.sh
+
+A bash script to quickly upload files from the command line. It reads
+a configuration file containing the URLs to upload to. Then each file
+given to the script will be uploaded to al URLs in the config.
+
+The config file is expected in
+`$XDG_CONFIG_HOME/docspell/ds.conf`. `$XDG_CONFIG_HOME` defaults to
+`~/.config`.
+
+The config file contains lines with key-value pairs, separated by an
+`=` sign. Lines starting with `#` are ignored. Example:
+
+```
+# Config file
+url.1 = http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
+url.2 = http://localhost:7880/api/v1/open/upload/item/6DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
+```
+
+The key must start with `url`.
+
+### Usage
+
+The `-h` option shows a help overview.
+
+The script takes a list of files as arguments. It checks the file
+types and will raise an error (and quit) if a file is included that is
+not a PDF. The `-s` option can be used to skip them instead.
+
+The `-c` option allows to specifiy a different config file.
+
+Example:
+
+``` bash
+./ds.sh ~/Downloads/*.pdf
+```
+
+
+## Webextension for Docspell
+
+Idea: Inside the browser click on a PDF and send it to docspell. It is
+downloaded in the context of your current page. Then handed to an
+application that pushes it to docspell. There is a browser add-on
+implementing this in `tools/webextension`. This add-on only works with
+firefox.
+
+### Install
+
+This is a bit complicated, since you need to install external tools
+and the web extension. Both work together.
+
+#### Install `ds.sh`
+
+First install the `ds.sh` tool somewhere, maybe `/usr/local/bin` as
+described above.
+
+
+#### Install the native part
+
+Then install the "native" part of the web extension:
+
+Copy or symlink the `native.py` script into some known location. For
+example:
+
+``` bash
+ln -s ~/docspell-checkout/tools/webextension/native/native.py /usr/local/share/docspell/native.py
+```
+
+Then copy the `app_manifest.json` to
+`$HOME/.mozilla/native-messaging-hosts/docspell.json`. For example:
+
+``` bash
+cp ~/docspell-checkout/tools/webextension/native/app_manifest.json  ~/.mozilla/native-messaging-hosts/docspell.json
+```
+
+See
+[here](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_manifests#Manifest_location)
+for details.
+
+And you might want to modify this json file, so the path to the
+`native.py` script is correct (it must be absolute).
+
+If the `ds.sh` script is in your `$PATH`, then this should
+work. Otherwise, edit the `native.py` script and change the path to
+the tool. Or create a file `$HOME/.config/docspell/ds.cmd` whose
+content is the path to the `ds.sh` script.
+
+
+#### Install the extension
+
+An extension file can be build using the `make-xpi.sh` script. But
+installing it in "standard" firefox won't work, because [Mozilla
+requires extensions to be signed by
+them](https://wiki.mozilla.org/Add-ons/Extension_Signing). This means
+creating an account and going through some process…. So here are two
+alternatives:
+
+1. Open firefox and type `about:debugging` in the addressbar. Then
+   click on *'Load Temporary Add-on...'* and select the
+   `manifest.json` file. The extension is now installed. The downside
+   is, that the extension will be removed once firefox is closed.
+2. Use Firefox ESR, which allows to install Add-ons not signed by
+   Mozilla. But it has to be configured: Open firefox and type
+   `about:config` in the address bar. Search for key
+   `xpinstall.signatures.required` and set it to `false`. This is
+   described on the last paragraph on [this
+   page](https://support.mozilla.org/en-US/kb/add-on-signing-in-firefox).
+
+When you right click on a file link, there should be a context menu
+entry *'Docspell Upload Helper'*. The add-on will download this file
+using the browser and then send the file path to the `native.py`
+script. This script will in turn call `ds.sh` which finally uploads it
+to your configured URLs.
+
+Open the Add-ons page (`Ctrl`+`Shift`+`A`), the new add-on should be
+there.
--- a/modules/microsite/docs/doc/uploading.md
+++ b/modules/microsite/docs/doc/uploading.md
@@ -0,0 +1,130 @@
+---
+layout: docs
+title: Uploads
+---
+
+# {{page.title}}
+
+
+This page describes, how files can get into docspell. Technically,
+there is just one way: via http multipart/form-data requests.
+
+
+## Authenticated Upload
+
+From within the web application there is the "Upload Files"
+page. There you can select multiple files to upload. You can also
+specify whether these files should become one item or if every file is
+a separate item.
+
+When you click "Submit" the files are uploaded and stored in the
+database. Then the job executor(s) are notified which immediately
+start processing them.
+
+Go to the top-right menu and click "Processing Queue" to see the
+current state.
+
+This obviously requires an authenticated user. While this is handy for
+ad-hoc uploads, it is very inconvenient for automating it by custom
+scripts. For this the next variant exists.
+
+## Anonymous Upload
+
+It is also possible to upload files without authentication. This
+should make tools that interact with docspell much easier to write.
+
+
+### Creating Anonymous Uploads
+
+Go to "Collective Settings" and then to the "Source" tab. A *Source*
+identifies an endpoint where files can be uploaded
+anonymously. Creating a new source creates a long unique id which is
+part on an url that can be used to upload files. You can choose any
+time to deactivate or delete the source at which point uploading is
+not possible anymore. The idea is to give this URL away safely. You
+can delete it any time and no passwords or secrets are visible, even
+your username is not visible.
+
+Example screenshot:
+
+<div class="thumbnail">
+  <img src="../img/sources-form.jpg">
+</div>
+
+This example shows a source with name "test". It defines two urls:
+
+- `/app/index.html#/upload/<id>`
+- `/api/v1/open/upload/item/<id>`
+
+The first points to a web page where everyone could upload files into
+your account. You could give this url to people for sending files
+directly into your docspell.
+
+The second url is the API url, which accepts the requests to upload
+files (which is used by the first url).
+
+For example, this url can be used to upload files with curl:
+
+``` bash
+$ curl -XPOST -F file=@test.pdf http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
+{"success":true,"message":"Files submitted."}
+```
+
+You could add more `-F file=@/path/to/your/file.pdf` to upload
+multiple files (note, the `@` is required by curl, so it knows that
+the following is a file).
+
+When files are uploaded to an source endpoint, the items resulting
+from this uploads are marked with the name of the source. So you know
+which source an item originated.
+
+If files are uploaded using the web applications *Upload files* page,
+the source is implicitly set to `webapp`. If you also want to let
+docspell count the files uploaded through the web interface, just
+create a source (can be inactive) with that name (`webapp`).
+
+
+## The Request
+
+This gives more details about the request for uploads. It is a http
+`multipart/form-data` request, with two possible fields:
+
+- meta
+- file
+
+The `file` field can appear multiple times and is required at least
+once. It is the part containing the file to upload.
+
+The `meta` part is completely optional and can define additional meta
+data, that docspell uses to create items from the given files. It
+allows to transfer structured information together with the
+unstructured binary files.
+
+The `meta` content must be `application/json` containing this
+structure:
+
+```
+{ multiple: Bool
+, direction: Maybe String
+}
+```
+
+The `multiple` property is by default `true`. It means that each file
+in the upload request corresponds to a single item. An upload with 5
+files will result in 5 items created. If it is `false`, then docspell
+will create just one item, that will then contain all files.
+
+Furthermore, the direction of the document (one of `incoming` or
+`outgoing`) can be given. It is optional, it can be left out or
+`null`.
+
+This kind of request is very common and most programming languages
+have support for this. For example, here is another curl command
+uploading two files with meta data:
+
+```
+curl -XPOST -F meta='{"multiple":false, "direction": "outgoing"}' \
+            -F file=@letter-en-source.pdf \
+            -F file=@letter-de-source.pdf \
+            http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
+```