Initial website

This commit is contained in:
Eike Kettner
2020-07-27 22:13:22 +02:00
parent dbd0f3ff97
commit f8c6f79b10
160 changed files with 8854 additions and 64 deletions

View File

@ -0,0 +1,9 @@
+++
title = "Installation and Deployment"
description = "There are multiple ways to install Docspell. This section contains detailed instructions."
weight = 30
sort_by = "weight"
insert_anchor_links = "right"
template = "pages.html"
redirect_to = "/docs/install/quickstart"
+++

View File

@ -0,0 +1,324 @@
+++
title = "Installing"
weight = 20
+++
# Docker
There is a [docker-compose](https://docs.docker.com/compose/) setup
available in the `/docker` folder. This setup is also taking care of
all the necessary [prerequisites](@/docs/install/prereq.md) and
creates a container to watch a directory for incoming files. It's only
3 steps:
1. Clone the github repository
```bash
$ git clone https://github.com/eikek/docspell
```
2. Change into the `docker` directory:
```bash
$ cd docspell/docker
```
3. Run `docker-compose up`:
```bash
$ export DOCSPELL_HEADER_VALUE="my-secret-123"
$ docker-compose up
```
The environment variable defines a secret that is shared between
some containers. You can define whatever you like. Please see the
[consumedir.sh](@/docs/tools/consumedir.md#docker) docs for
additional info.
4. Goto `http://localhost:7880`, signup and login. When signing up,
you can choose the same name for collective and user. Then login
with this name and the password.
5. (Optional) Create a folder `./docs/<collective-name>` (the name you
chose for the collective at registration) and place files in there
for importing them.
The directory contains a file `docspell.conf` that you can
[modify](@/docs/configure/_index.md) as needed.
# Download, Unpack, Run
You can install via zip or deb archives. Please see the
[prerequisites](@/docs/install/prereq.md) first.
## Using zip files
You need to download the two files:
- [docspell-restserver-{{version()}}.zip](https://github.com/eikek/docspell/releases/download/v{{version()}}/docspell-restserver-{{version()}}.zip)
- [docspell-joex-{{version()}}.zip](https://github.com/eikek/docspell/releases/download/v{{version()}}/docspell-joex-{{version()}}.zip)
1. Unzip both files:
``` bash
$ unzip docspell-*.zip
```
2. Open two terminal windows and navigate to the the directory
containing the zip files.
3. Start both components executing:
``` bash
$ ./docspell-restserver*/bin/docspell-restserver
```
in one terminal and
``` bash
$ ./docspell-joex*/bin/docspell-joex
```
in the other.
4. Point your browser to: <http://localhost:7880/app>
5. Register a new account, sign in and try it.
Note, that this setup doesn't include watching a directory. You can
use the [consumedir.sh](@/docs/tools/consumedir.md) tool for this or
use the docker variant below.
## Using deb files
The DEB packages can be installed on Debian, or Debian based Distros:
``` bash
$ sudo dpkg -i docspell*.deb
```
Then the start scripts are in your `$PATH`. Run `docspell-restserver`
or `docspell-joex` from a terminal window.
The packages come with a systemd unit file that will be installed to
autostart the services.
# Nix
## Install via Nix
Docspell can be installed via the [nix](https://nixos.org/nix) package
manager, which is available for Linux and OSX. Docspell is currently not
part of the [nixpkgs collection](https://nixos.org/nixpkgs/), but you
can use the derivation from this repository. This is sometimes
referred to as [import from
derivation](https://nixos.wiki/wiki/Import_From_Derivation).
For example, the `builtins.fetchTarball` function can be used to
retrieve the files; then import the `release.nix` file:
``` nix
let
docspellsrc = builtins.fetchTarball "https://github.com/eikek/docspell/archive/master.tar.gz";
in
import "${docspellsrc}/nix/release.nix";
```
This creates a set containing a function for creating a derivation for
docspell. This then needs to be called like other custom packages. For
example, in your `~/.nixpkgs/config.nix` you could write this:
``` nix
let
docspellsrc = builtins.fetchTarball "https://github.com/eikek/docspell/archive/master.tar.gz";
docspell = import "${docspellsrc}/nix/release.nix";
in
{ packageOverrides = pkgs:
let
callPackage = pkgs.lib.callPackageWith(custom // pkgs);
custom = {
docspell = callPackage docspell.currentPkg {};
};
in custom;
}
```
The `docspell` custom package is again a set that contains derivations
for all 3 installable docspell programs: the restserver, joex and the
tools.
Then you can install docspell via `nix-shell` or `nix-env`, for example:
``` bash
$ nix-env -iA nixpkgs.docspell.server nixpkgs.docspell.joex nixpkgs.docspell.tools
```
You may need to replace `nixpkgs` with `nixos` when you're on NixOS.
The expression `docspell.currentPkg` refers to the most current
release of Docspell. So even if you use the tarball of the current
master branch, the `release.nix` file only contains derivations for
releases. The expression `docspell.currentPkg` is a shortcut for
selecting the most current release. For example it translates to
`docspell.pkg docspell.cfg.v{{ pversion() }}` if the current version
is `{{version()}}`.
## Docspell on NixOS {#nixos}
If you are running [NixOS](https://nixos.org), there is a module
definition for installing Docspell as a service using systemd.
There are the following modules provided:
- restserver
- joex
- consumedir
The `consumedir` module defines a systemd unit that starts the
`consumedir.sh` script to watch one or more directories for new files.
You need to import the `release.nix` file as described above in your
`configuration.nix` and then append the docspell module to your list of
modules. Here is an example:
```nix
{ config, pkgs, ... }:
let
docspellsrc = builtins.fetchTarball "https://github.com/eikek/docspell/archive/master.tar.gz";
docspell = import "${docspellsrc}/nix/release.nix";
in
{
imports = [ mymodule1 mymodule2 ] ++ docspell.modules;
nixpkgs = {
config = {
packageOverrides = pkgs:
let
callPackage = pkgs.lib.callPackageWith(custom // pkgs);
custom = {
docspell = callPackage docspell.currentPkg {};
};
in custom;
};
};
services.docspell-restserver = {
enable = true;
base-url = "http://docspelltest:7880";
# ... more settings here
};
services.docspell-joex = {
enable = true;
base-url = "http://docspelltexst:7878";
# ... more settings here
};
services.docspell-consumedir = {
enable = true;
watchDirs = ["/tmp/test"];
urls = ["http://localhost:7880/api/v1/open/upload/item/the-source-id"];
};
...
}
```
Please see the `nix/module-server.nix` and `nix/module-joex.nix` files
for the set of options. The nixos options are modelled after the
default configuration file.
The modules files are only applicable to the newest version of
Docspell. If you really need an older version, checkout the
appropriate commit.
## NixOS Example
This is a example system configuration that installs docspell with a
postgres database. This snippet can be used to create a vm (using
`nixos-rebuild build-vm` as shown above) or a container, for example.
``` nix
{ config, pkgs, ... }:
let
docspellsrc = builtins.fetchTarball "https://github.com/eikek/docspell/archive/master.tar.gz";
docspell = import "${docspellsrc}/nix/release.nix";
in
{
imports = docspell.modules;
nixpkgs = {
config = {
packageOverrides = pkgs:
let
callPackage = pkgs.lib.callPackageWith(custom // pkgs);
custom = {
docspell = callPackage docspell.currentPkg {};
};
in custom;
};
};
##### just for the example…
users.users.root = {
password = "root";
};
#####
# install docspell-joex and enable the systemd service
services.docspell-joex = {
enable = true;
base-url = "http://localhost:7878";
bind = {
address = "0.0.0.0";
port = 7878;
};
scheduler = {
pool-size = 1;
};
jdbc = {
url = "jdbc:postgresql://localhost:5432/docspell";
user = "docspell";
password = "docspell";
};
};
# install docspell-restserver and enable the systemd service
services.docspell-restserver = {
enable = true;
base-url = "http://localhost:7880";
bind = {
address = "0.0.0.0";
port = 7880;
};
auth = {
server-secret = "b64:EirgaudMyNvWg4TvxVGxTu-fgtrto4ETz--Hk9Pv2o4=";
};
backend = {
signup = {
mode = "invite";
new-invite-password = "dsinvite2";
invite-time = "30 days";
};
jdbc = {
url = "jdbc:postgresql://localhost:5432/docspell";
user = "docspell";
password = "docspell";
};
};
};
# install postgresql and initially create user/database
services.postgresql =
let
pginit = pkgs.writeText "pginit.sql" ''
CREATE USER docspell WITH PASSWORD 'docspell' LOGIN CREATEDB;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO docspell;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO docspell;
CREATE DATABASE DOCSPELL OWNER 'docspell';
'';
in {
enable = true;
package = pkgs.postgresql_11;
enableTCPIP = true;
initialScript = pginit;
port = 5432;
authentication = ''
host all all 0.0.0.0/0 md5
'';
};
networking = {
hostName = "docspellexample";
firewall.allowedTCPPorts = [7880];
};
}
```

View File

@ -0,0 +1,107 @@
+++
title = "Prerequisites"
weight = 10
+++
The two components have one prerequisite in common: they both require
Java to run. While this is the only requirement for the *REST server*,
the *Joex* components requires some more external programs.
The rest server and joex components are not required to "see" each
other, though it is recommended.
# Java
Very often, Java is already installed. You can check this by opening a
terminal and typing `java -version`. Otherwise install Java using your
package manager or see [this site](https://adoptopenjdk.net/) for
other options.
It is enough to install the JRE. The JDK is required, if you want to
build docspell from source.
Docspell has been tested with Java version 1.8 (or sometimes referred
to as JRE 8 and JDK 8, respectively). The pre-build packages are also
build using JDK 8. But a later version of Java should work as well.
The next tools are only required on machines running the *Joex*
component.
# External Programs for Joex
- [Ghostscript](http://pages.cs.wisc.edu/~ghost/) (the `gs` command)
is used to extract/convert PDF files into images that are then fed
to ocr. It is available on most GNU/Linux distributions.
- [Unpaper](https://github.com/Flameeyes/unpaper) is a program that
pre-processes images to yield better results when doing ocr. If this
is not installed, docspell tries without it. However, it is
recommended to install, because it [improves text
extraction](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality)
(at the expense of a longer runtime).
- [Tesseract](https://github.com/tesseract-ocr/tesseract) is the tool
doing the OCR (converts images into text). It can also convert
images into pdf files. It is a widely used open source OCR engine.
Tesseract 3 and 4 should work with docspell; you can adopt the
command line in the configuration file, if necessary.
- [Unoconv](https://github.com/unoconv/unoconv) is used to convert
office documents into PDF files. It uses libreoffice/openoffice.
- [wkhtmltopdf](https://wkhtmltopdf.org/) is used to convert HTML into
PDF files.
- [OCRmyPDF](https://github.com/jbarlow83/OCRmyPDF) can be optionally
used to convert PDF to PDF files. It adds an OCR layer to scanned
PDF files to make them searchable. It also creates PDF/A files from
the input pdf.
The performance of `unoconv` can be improved by starting `unoconv -l`
in a separate process. This runs a libreoffice/openoffice listener and
therefore avoids starting one each time `unoconv` is called.
## Example Debian
On Debian this should install all joex requirements:
``` bash
sudo apt-get install ghostscript tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng unpaper unoconv wkhtmltopdf ocrmypdf
```
# Apache SOLR
SOLR is used to provide the fulltext search feature. This feature can
be disabled, so installing SOLR is optional. But without it, there is
no fulltext search.
When installing manually (i.e. not via docker), just install solr and
create a core as described in the [solr
documentation](https://lucene.apache.org/solr/guide/8_4/installing-solr.html).
That will provide you with the connection url (the last part is the
core name).
When using the provided `docker-compose.yml` setup, SOLR is already setup.
SOLR must be reachable from all joex and all rest server components.
# Database
Both components must have access to a SQL database. The SQL database
contains all data (including binary files) and is the central
component of docspell. Docspell has support these databases:
- PostreSQL
- MariaDB
- H2
The H2 database is an interesting option for personal and mid-size
setups, as it requires no additional work. It is integrated into
docspell and works really well out of the box. It is also configured
as the default database.
When using H2, make sure that all components access the same database
the jdbc url must point to the same file. Then, it is important to
add the options
`;MODE=PostgreSQL;DATABASE_TO_LOWER=TRUE;AUTO_SERVER=TRUE` at the end
of the url. See the [config page](@/docs/configure/_index.md#jdbc) for
an example.
For large installations, PostgreSQL or MariaDB is recommended. Create
a database and a user with enough privileges (read, write, create
table) to that database.

View File

@ -0,0 +1,23 @@
+++
title = "Quickstart"
weight = 0
+++
To get started, here are some quick links:
- Using [docker and
docker-compose](@/docs/install/installing.md#docker). This sets up
everything: all prerequisites, both docspell components and a
container running the [consumedir.sh](@/docs/tools/consumedir.md)
script to import files that are dropped in a folder.
- [Download, Unpack and
Run](@/docs/install/installing.md#download-unpack-run). This option
is also very quick, but you need to check the
[prerequisites](@/docs/install/prereq.md) yourself. Database is
already setup, but you'd need to setup SOLR (when using fulltext
search) and install some programs for the joex component. This
applies to the `zip` and `deb` files. The files can be downloaded
from the [release page](https://github.com/eikek/docspell/releases/latest).
- via the [nix package manager](@/docs/install/installing.md#nix) and/or as a [NixOS
module](@/docs/install/installing.md#nixos). If you use nix/nixos, you
know what to do. The linked page contains some examples.

View File

@ -0,0 +1,95 @@
+++
title = "Reverse Proxy"
weight = 50
+++
This contains examples for how to use docspell behind a reverse proxy.
For the examples below, assume the following:
- Docspell app is available at `192.168.1.11:7880`. If it is running
on the same machine as the reverse proxy server, you can set
`localhost:7880` instead.
- The external domain/hostname is `docspell.example.com`
## Configuring Docspell
These settings require a complement config part in the docspell
configuration file:
- First, if Docspell REST server is on a different machine, you need
to change the `bind.address` setting to be either `0.0.0.0` or the
ip address of the network interface that the reverse proxy server
connects to.
``` conf
docspell.server {
# Where the server binds to.
bind {
address = "192.168.1.11"
port = 7880
}
}
```
Note that a value of `0.0.0.0` instead of `192.168.1.11` will bind
the server to every network interface.
- Docspell needs to know the external url. The `base-url` setting
must point to the external address. Using above values, it must be
set to `https://docspell.example.com`.
``` conf
docspell.server {
# This is the base URL this application is deployed to. This is used
# to create absolute URLs and to configure the cookie.
base-url = "https://docspell.example.com"
...
}
```
Note that this example assumes that the docspell-joex component is on
the same machine. This page is only related for exposing the REST
server and web application.
If you have examples for more servers, please let me know or add it to
this site.
## Nginx
This defines two servers: one listens for http traffic and redirects
to the https variant. Additionally it defines the let's encrypt
`.well-known` folder name.
The https server endpoint is configured with the let's encrypt
certificates and acts as a proxy for the application at
`192.168.1.11:7880`.
``` conf
server {
listen 0.0.0.0:80 ;
listen [::]:80 ;
server_name docspell.example.com ;
location /.well-known/acme-challenge {
root /var/data/nginx/ACME-PUBLIC;
auth_basic off;
}
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 0.0.0.0:443 ssl http2 ;
listen [::]:443 ssl http2 ;
server_name docspell.example.com ;
location /.well-known/acme-challenge {
root /var/data/nginx/ACME-PUBLIC;
auth_basic off;
}
ssl_certificate /var/lib/acme/docspell.example.com/fullchain.pem;
ssl_certificate_key /var/lib/acme/docspell.example.com/key.pem;
ssl_trusted_certificate /var/lib/acme/docspell.example.com/full.pem;
location / {
proxy_pass http://192.168.1.11:7880;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
}
}
```

View File

@ -0,0 +1,37 @@
+++
title = "Raspberry-Pi and Similiar"
weight = 40
+++
# Raspberry Pi, and similiar
Both component can run next to each other on a raspberry pi or
similiar device.
## REST Server
The REST server component runs very well on the Raspberry Pi and
similiar devices. It doesn't require much resources, because the heavy
work is done by the joex components.
## Joex
Running the joex component on the Raspberry Pi is possible, but will
result in long processing times for OCR. Files that don't require OCR
are no problem.
Tested on a RPi model 3 (4 cores, 1G RAM) processing a PDF (scanned
with 300dpi) with two pages took 9:52. You can speed it up
considerably by uninstalling the `unpaper` command, because this step
takes quite long. This, of course, reduces the quality of OCR. But
without `unpaper` the same sample pdf was then processed in 1:24, a
speedup of 8 minutes.
You should limit the joex pool size to 1 and, depending on your model
and the amount of RAM, set a heap size of at least 500M
(`-J-Xmx500M`).
For personal setups, when you don't need the processing results asap,
this can work well enough.

View File

@ -0,0 +1,66 @@
+++
title = "Running"
weight = 30
+++
# Running
Run the start script (in the corresponding `bin/` directory when using
the zip files):
```
$ ./docspell-restserver*/bin/docspell-restserver
$ ./docspell-joex*/bin/docspell-joex
```
This will startup both components using the default configuration. The
configuration should be adopted to your needs. For example, the
database connection is configured to use a H2 database in the `/tmp`
directory. Please refer to the [configuration
page](@/docs/configure/_index.md) for how to create a custom config
file. Once you have your config file, simply pass it as argument to
the command:
```
$ ./docspell-restserver*/bin/docspell-restserver /path/to/server-config.conf
$ ./docspell-joex*/bin/docspell-joex /path/to/joex-config.conf
```
After starting the rest server, you can reach the web application at
path `/app`, so using default values it would be
`http://localhost:7880/app`. There also is a redirect from `/` to
`/app`.
You should be able to create a new account and sign in. Check the
[configuration page](@/docs/configure/_index.md) to further customize
docspell.
## Options
The start scripts support some options to configure the JVM. One often
used setting is the maximum heap size of the JVM. By default, java
determines it based on properties of the current machine. You can
specify it by given java startup options to the command:
```
$ ./docspell-restserver*/bin/docspell-restserver -J-Xmx1G -- /path/to/server-config.conf
```
This would limit the maximum heap to 1GB. The double slash separates
internal options and the arguments to the program. Another frequently
used option is to change the default temp directory. Usually it is
`/tmp`, but it may be desired to have a dedicated temp directory,
which can be configured:
```
$ ./docspell-restserver*/bin/docspell-restserver -J-Xmx1G -Djava.io.tmpdir=/path/to/othertemp -- /path/to/server-config.conf
```
The command:
```
$ ./docspell-restserver*/bin/docspell-restserver -h
```
gives an overview of supported options.