2020-07-27 20:13:22 +00:00
|
|
|
|
+++
|
|
|
|
|
title = "Consume Directory"
|
|
|
|
|
description = "A script to watch a directory for new files and upload them to docspell."
|
2020-08-08 18:07:48 +00:00
|
|
|
|
weight = 30
|
2020-07-27 20:13:22 +00:00
|
|
|
|
+++
|
|
|
|
|
|
2020-07-30 20:27:10 +00:00
|
|
|
|
# Introduction
|
|
|
|
|
|
2020-07-27 20:13:22 +00:00
|
|
|
|
The `consumerdir.sh` is a bash script that works in two modes:
|
|
|
|
|
|
|
|
|
|
- Go through all files in given directories (recursively, if `-r` is
|
|
|
|
|
specified) and sent each to docspell.
|
|
|
|
|
- Watch one or more directories for new files and upload them to
|
|
|
|
|
docspell.
|
|
|
|
|
|
|
|
|
|
It can watch or go through one or more directories. Files can be
|
|
|
|
|
uploaded to multiple urls.
|
|
|
|
|
|
|
|
|
|
Run the script with the `-h` or `--help` option, to see a short help
|
|
|
|
|
text. The help text will also show the values for any given option.
|
|
|
|
|
|
|
|
|
|
The script requires `curl` for uploading. It requires the
|
|
|
|
|
`inotifywait` command if directories should be watched for new
|
|
|
|
|
files.
|
|
|
|
|
|
|
|
|
|
Example for watching two directories:
|
|
|
|
|
|
|
|
|
|
``` bash
|
|
|
|
|
./tools/consumedir.sh --path ~/Downloads --path ~/pdfs -m -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The script by default watches the given directories. If the `-o` or
|
|
|
|
|
`--once` option is used, it will instead go through these directories
|
|
|
|
|
and upload all files in there.
|
|
|
|
|
|
|
|
|
|
Example for uploading all immediatly (the same as above only with `-o`
|
|
|
|
|
added):
|
|
|
|
|
|
|
|
|
|
``` bash
|
|
|
|
|
$ consumedir.sh -o --path ~/Downloads --path ~/pdfs/ -m -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The URL can be any docspell url that accepts uploads without
|
|
|
|
|
authentication. This is usually a [source
|
|
|
|
|
url](@/docs/webapp/uploading.md#anonymous-upload). It is also possible
|
|
|
|
|
to use the script with the [integration
|
2020-11-14 20:48:52 +00:00
|
|
|
|
endpoint](@/docs/api/upload.md#integration-endpoint).
|
2020-07-27 20:13:22 +00:00
|
|
|
|
|
2021-01-24 22:24:11 +00:00
|
|
|
|
The script can be run multiple times and on on multiple machines, the
|
|
|
|
|
files are transferred via HTTP to the docspell server. For example, it
|
|
|
|
|
is convenient to set it up on your workstation, so that you can drop
|
|
|
|
|
files into some local folder to be immediatly transferred to docspell
|
|
|
|
|
(e.g. when downloading something from the browser).
|
2020-07-27 20:13:22 +00:00
|
|
|
|
|
|
|
|
|
## Integration Endpoint
|
|
|
|
|
|
|
|
|
|
When given the `-i` or `--integration` option, the script changes its
|
|
|
|
|
behaviour slightly to work with the [integration
|
2020-11-14 20:48:52 +00:00
|
|
|
|
endpoint](@/docs/api/upload.md#integration-endpoint).
|
2020-07-27 20:13:22 +00:00
|
|
|
|
|
|
|
|
|
First, if `-i` is given, it implies `-r` – so the directories are
|
|
|
|
|
watched or traversed recursively. The script then assumes that there
|
|
|
|
|
is a subfolder with the collective name. Files must not be placed
|
|
|
|
|
directly into a folder given by `-p`, but below a sub-directory that
|
|
|
|
|
matches a collective name. In order to know for which collective the
|
|
|
|
|
file is, the script uses the first subfolder.
|
|
|
|
|
|
2020-08-21 22:18:56 +00:00
|
|
|
|
If the endpoint is protected, the credentials can be specified as
|
2020-07-27 20:13:22 +00:00
|
|
|
|
arguments `--iuser` and `--iheader`, respectively. The format is for
|
|
|
|
|
both `<name>:<value>`, so the username cannot contain a colon
|
|
|
|
|
character (but the password can).
|
|
|
|
|
|
|
|
|
|
Example:
|
|
|
|
|
``` bash
|
|
|
|
|
$ consumedir.sh -i -iheader 'Docspell-Integration:test123' -m -p ~/Downloads/ http://localhost:7880/api/v1/open/integration/item
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The url is the integration endpoint url without the collective, as
|
|
|
|
|
this is amended by the script.
|
|
|
|
|
|
|
|
|
|
This watches the folder `~/Downloads`. If a file is placed in this
|
|
|
|
|
folder directly, say `~/Downloads/test.pdf` the upload will fail,
|
|
|
|
|
because the collective cannot be determined. Create a subfolder below
|
|
|
|
|
`~/Downloads` with the name of a collective, for example
|
|
|
|
|
`~/Downloads/family` and place files somewhere below this `family`
|
|
|
|
|
subfolder, like `~/Downloads/family/test.pdf`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Duplicates
|
|
|
|
|
|
|
|
|
|
With the `-m` option, the script will not upload files that already
|
|
|
|
|
exist at docspell. For this the `sha256sum` command is required.
|
|
|
|
|
|
|
|
|
|
So you can move and rename files in those folders without worring
|
|
|
|
|
about duplicates. This allows to keep your files organized using the
|
|
|
|
|
file-system and have them mirrored into docspell as well.
|
|
|
|
|
|
|
|
|
|
|
2021-01-24 22:24:11 +00:00
|
|
|
|
## Network Filesystems (samba cifs, nfs)
|
|
|
|
|
|
|
|
|
|
Watching a directory for changes relies on `inotify` subsystem on
|
|
|
|
|
linux. This doesn't work on network filesystems like nfs or cifs. Here
|
|
|
|
|
are some ideas to get around this limitation:
|
|
|
|
|
|
|
|
|
|
1. The `consumedir.sh` is just a shell script and doesn't need to run
|
|
|
|
|
on the same machine as docspell. (Note that the default docker
|
|
|
|
|
setup is mainly for demoing and quickstart, it's not required to
|
|
|
|
|
run all of them on one machine). So the best option is to put the
|
|
|
|
|
consumedir on the machine that contains the local filesystem. All
|
|
|
|
|
files are send via HTTP to the docspell server anyways, so there is
|
|
|
|
|
no need to first transfer them via a network filesystem or rsync.
|
|
|
|
|
2. If option 1 is not possible for some reason, and you need to check
|
|
|
|
|
a network filesystem, the only option left (that I know) is to
|
|
|
|
|
periodically poll this directory. This is also possible with
|
|
|
|
|
consumedir, using the `-o` or `--once` option (see above). You'd
|
|
|
|
|
need to setup the systemd unit file a bit differently and add a
|
|
|
|
|
timer to it (or you can use cron or something more fancy…).
|
|
|
|
|
3. Copy the files to the machine that runs consumedir, via rsync for
|
|
|
|
|
example. Note that this has no advantage over otpion 1, as you now
|
|
|
|
|
need to setup rsync on the other machine to run either periodically
|
|
|
|
|
or when some file arrives. Then you can as well run the consumedir
|
|
|
|
|
script. But it might be more convenient, if rsync is already
|
|
|
|
|
running.
|
|
|
|
|
|
2020-07-30 20:27:10 +00:00
|
|
|
|
# Systemd
|
2020-07-27 20:13:22 +00:00
|
|
|
|
|
|
|
|
|
The script can be used with systemd to run as a service. This is an
|
|
|
|
|
example unit file:
|
|
|
|
|
|
|
|
|
|
``` systemd
|
|
|
|
|
[Unit]
|
|
|
|
|
After=networking.target
|
|
|
|
|
Description=Docspell Consumedir
|
|
|
|
|
|
|
|
|
|
[Service]
|
|
|
|
|
Environment="PATH=/set/a/path"
|
|
|
|
|
|
|
|
|
|
ExecStart=/bin/su -s /bin/bash someuser -c "consumedir.sh --path '/a/path/' -m 'http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ'"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This unit file is just an example, it needs some fiddling. It assumes
|
|
|
|
|
an existing user `someuser` that is used to run this service. The url
|
|
|
|
|
`http://localhost:7880/api/v1/open/upload/...` is an anonymous upload
|
|
|
|
|
url as described [here](@/docs/webapp/uploading.md#anonymous-upload).
|
|
|
|
|
|
|
|
|
|
|
2020-07-30 20:27:10 +00:00
|
|
|
|
# Docker
|
2020-07-27 20:13:22 +00:00
|
|
|
|
|
2021-01-24 22:24:11 +00:00
|
|
|
|
The provided docker-compose setup runs this script to watch a single
|
2020-07-27 20:13:22 +00:00
|
|
|
|
directory, `./docs` in current directory, for new files. If a new file
|
|
|
|
|
is detected, it is pushed to docspell.
|
|
|
|
|
|
|
|
|
|
This utilizes the [integration
|
2020-11-14 20:48:52 +00:00
|
|
|
|
endpoint](@/docs/api/upload.md#integration-endpoint), which is
|
2020-07-27 20:13:22 +00:00
|
|
|
|
enabled in the config file, to allow uploading documents for all
|
|
|
|
|
collectives. A subfolder must be created for each registered
|
|
|
|
|
collective. The docker containers are configured to use http-header
|
|
|
|
|
protection for the integration endpoint. This requires you to provide
|
|
|
|
|
a secret, that is shared between the rest-server and the
|
|
|
|
|
`consumedir.sh` script. This can be done by defining an environment
|
|
|
|
|
variable which gets picked up by the containers defined in
|
|
|
|
|
`docker-compose.yml`:
|
|
|
|
|
|
|
|
|
|
``` bash
|
|
|
|
|
export DOCSPELL_HEADER_VALUE="my-secret"
|
|
|
|
|
docker-compose up
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now you can create a folder `./docs/<collective-name>` and place all
|
|
|
|
|
files in there that you want to import. Once dropped in this folder
|
|
|
|
|
the `consumedir` container will push it to docspell.
|