mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-05-31 20:42:51 +00:00
Extend consumedir.sh to work with integration endpoint
Now running one consumedir script can upload files to multiple collectives separately.
This commit is contained in:
parent
d13e0a4370
commit
8500d4d804
@ -8,21 +8,20 @@ permalink: doc/tools/consumedir
|
||||
|
||||
The `consumerdir.sh` is a bash script that works in two modes:
|
||||
|
||||
- Go through all files in given directories (non recursively) and sent
|
||||
each to docspell.
|
||||
- Go through all files in given directories (recursively, if `-r` is
|
||||
specified) and sent each to docspell.
|
||||
- Watch one or more directories for new files and upload them to
|
||||
docspell.
|
||||
|
||||
It can watch or go through one or more directories. Files can be
|
||||
uploaded to multiple urls.
|
||||
|
||||
Run the script with the `-h` option, to see a short help text. The
|
||||
help text will also show the values for any given option.
|
||||
Run the script with the `-h` or `--help` option, to see a short help
|
||||
text. The help text will also show the values for any given option.
|
||||
|
||||
The script requires `curl` for uploading. It requires the
|
||||
`inotifywait` command if directories should be watched for new
|
||||
files. If the `-m` option is used, the script will skip duplicate
|
||||
files. For this the `sha256sum` command is required.
|
||||
files.
|
||||
|
||||
Example for watching two directories:
|
||||
|
||||
@ -30,18 +29,69 @@ Example for watching two directories:
|
||||
./tools/consumedir.sh --path ~/Downloads --path ~/pdfs -m -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
|
||||
```
|
||||
|
||||
The script by default watches the given directories. If the `-o`
|
||||
option is used, it will instead go through these directories and
|
||||
upload all files in there.
|
||||
The script by default watches the given directories. If the `-o` or
|
||||
`--once` option is used, it will instead go through these directories
|
||||
and upload all files in there.
|
||||
|
||||
Example for uploading all immediatly (the same as above only with `-o`
|
||||
added):
|
||||
|
||||
``` bash
|
||||
./tools/consumedir.sh -o --path ~/Downloads --path ~/pdfs/ -m -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
|
||||
$ consumedir.sh -o --path ~/Downloads --path ~/pdfs/ -m -dv http://localhost:7880/api/v1/open/upload/item/5DxhjkvWf9S-CkWqF3Kr892-WgoCspFWDo7-XBykwCyAUxQ
|
||||
```
|
||||
|
||||
|
||||
The URL can be any docspell url that accepts uploads without
|
||||
authentication. This is usually a [source
|
||||
url](../uploading#anonymous-upload). It is also possible to use the
|
||||
script with the [integration
|
||||
endpoint](../uploading#integration-endpoint).
|
||||
|
||||
|
||||
## Integration Endpoint
|
||||
|
||||
When given the `-i` or `--integration` option, the script changes its
|
||||
behaviour slightly to work with the [integration
|
||||
endpoint](../uploading#integration-endpoint).
|
||||
|
||||
First, if `-i` is given, it implies `-r` – so the directories are
|
||||
watched or traversed recursively. The script then assumes that there
|
||||
is a subfolder with the collective name. Files must not be placed
|
||||
directly into a folder given by `-p`, but below a sub-directory that
|
||||
matches a collective name. In order to know for which collective the
|
||||
file is, the script uses the first subfolder.
|
||||
|
||||
If the endpoint is protected, these credentials can be specified as
|
||||
arguments `--iuser` and `--iheader`, respectively. The format is for
|
||||
both `<name>:<value>`, so the username cannot contain a colon
|
||||
character (but the password can).
|
||||
|
||||
Example:
|
||||
``` bash
|
||||
$ consumedir.sh -i -iheader 'Docspell-Integration:test123' -m -p ~/Downloads/ http://localhost:7880/api/v1/open/integration/item
|
||||
```
|
||||
|
||||
The url is the integration endpoint url without the collective, as
|
||||
this is amended by the script.
|
||||
|
||||
This watches the folder `~/Downloads`. If a file is placed in this
|
||||
folder directly, say `~/Downloads/test.pdf` the upload will fail,
|
||||
because the collective cannot be determined. Create a subfolder below
|
||||
`~/Downloads` with the name of a collective, for example
|
||||
`~/Downloads/family` and place files somewhere below this `family`
|
||||
subfolder, like `~/Downloads/family/test.pdf`.
|
||||
|
||||
|
||||
## Duplicates
|
||||
|
||||
With the `-m` option, the script will not upload files that already
|
||||
exist at docspell. For this the `sha256sum` command is required.
|
||||
|
||||
So you can move and rename files in those folders without worring
|
||||
about duplicates. This allows to keep your files organized using the
|
||||
file-system and have them mirrored into docspell as well.
|
||||
|
||||
|
||||
## Systemd
|
||||
|
||||
The script can be used with systemd to run as a service. This is an
|
||||
|
@ -112,8 +112,15 @@ the [configuration file](configure#rest-server).
|
||||
If queried by a `GET` request, it returns whether it is enabled and
|
||||
the collective exists.
|
||||
|
||||
See the [SMTP gateway](tools/smtpgateway) for an example to use this
|
||||
endpoint.
|
||||
It is also possible to check for existing files using their sha256
|
||||
checksum with:
|
||||
|
||||
```
|
||||
/api/v1/open/integration/checkfile/[collective-name]/[sha256-checksum]
|
||||
```
|
||||
|
||||
See the [SMTP gateway](tools/smtpgateway) or the [consumedir
|
||||
script](tools/consumedir) for examples to use this endpoint.
|
||||
|
||||
## The Request
|
||||
|
||||
|
@ -299,8 +299,8 @@ paths:
|
||||
$ref: "#/components/schemas/BasicResult"
|
||||
/open/integration/item/{id}:
|
||||
get:
|
||||
tags: [ Upload Integration ]
|
||||
summary: Upload files to docspell.
|
||||
tags: [ Integration Endpoint ]
|
||||
summary: Check if integration endpoint is available.
|
||||
description: |
|
||||
Allows to check whether an integration endpoint is enabled for
|
||||
a collective. The collective is given by the `id` parameter.
|
||||
@ -325,7 +325,7 @@ paths:
|
||||
401:
|
||||
description: Unauthorized
|
||||
post:
|
||||
tags: [ Upload Integration ]
|
||||
tags: [ Integration Endpoint ]
|
||||
summary: Upload files to docspell.
|
||||
description: |
|
||||
Upload a file to docspell for processing. The id is a
|
||||
@ -368,6 +368,30 @@ paths:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/BasicResult"
|
||||
/open/integration/checkfile/{id}/{checksum}:
|
||||
get:
|
||||
tags: [ Integration Endpoint ]
|
||||
summary: Check if a file is in docspell.
|
||||
description: |
|
||||
Checks if a file with the given SHA-256 checksum is in
|
||||
docspell. The `id` is the *collective name*. This route only
|
||||
exists, if it is enabled in the configuration file.
|
||||
|
||||
The result shows all items that contains a file with the given
|
||||
checksum.
|
||||
security:
|
||||
- authTokenHeader: []
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/id"
|
||||
- $ref: "#/components/parameters/checksum"
|
||||
responses:
|
||||
200:
|
||||
description: Ok
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/CheckFileResult"
|
||||
|
||||
/open/signup/register:
|
||||
post:
|
||||
tags: [ Registration ]
|
||||
|
@ -42,7 +42,7 @@ object CheckFileRoutes {
|
||||
}
|
||||
}
|
||||
|
||||
private def convert(v: Vector[RItem]): CheckFileResult =
|
||||
def convert(v: Vector[RItem]): CheckFileResult =
|
||||
CheckFileResult(
|
||||
v.nonEmpty,
|
||||
v.map(r => BasicItem(r.id, r.name, r.direction, r.state, r.created, r.itemDate))
|
||||
|
@ -8,6 +8,7 @@ import docspell.common._
|
||||
import docspell.restserver.Config
|
||||
import docspell.restserver.conv.Conversions._
|
||||
import docspell.restserver.http4s.Responses
|
||||
import docspell.store.records.RItem
|
||||
import org.http4s._
|
||||
import org.http4s.circe.CirceEntityEncoder._
|
||||
import org.http4s.dsl.Http4sDsl
|
||||
@ -24,12 +25,17 @@ object IntegrationEndpointRoutes {
|
||||
val dsl = new Http4sDsl[F] {}
|
||||
import dsl._
|
||||
|
||||
def validate(req: Request[F], collective: Ident) =
|
||||
for {
|
||||
_ <- authRequest(req, cfg.integrationEndpoint)
|
||||
_ <- checkEnabled(cfg.integrationEndpoint)
|
||||
_ <- lookupCollective(collective, backend)
|
||||
} yield ()
|
||||
|
||||
HttpRoutes.of {
|
||||
case req @ POST -> Root / "item" / Ident(collective) =>
|
||||
(for {
|
||||
_ <- authRequest(req, cfg.integrationEndpoint)
|
||||
_ <- checkEnabled(cfg.integrationEndpoint)
|
||||
_ <- lookupCollective(collective, backend)
|
||||
_ <- validate(req, collective)
|
||||
res <- EitherT.liftF[F, Response[F], Response[F]](
|
||||
uploadFile(collective, backend, cfg, dsl)(req)
|
||||
)
|
||||
@ -37,11 +43,20 @@ object IntegrationEndpointRoutes {
|
||||
|
||||
case req @ GET -> Root / "item" / Ident(collective) =>
|
||||
(for {
|
||||
_ <- authRequest(req, cfg.integrationEndpoint)
|
||||
_ <- checkEnabled(cfg.integrationEndpoint)
|
||||
_ <- lookupCollective(collective, backend)
|
||||
_ <- validate(req, collective)
|
||||
res <- EitherT.liftF[F, Response[F], Response[F]](Ok(()))
|
||||
} yield res).fold(identity, identity)
|
||||
|
||||
case req @ GET -> Root / "checkfile" / Ident(collective) / checksum =>
|
||||
(for {
|
||||
_ <- validate(req, collective)
|
||||
items <- EitherT.liftF[F, Response[F], Vector[RItem]](
|
||||
backend.itemSearch.findByFileCollective(checksum, collective)
|
||||
)
|
||||
resp <-
|
||||
EitherT.liftF[F, Response[F], Response[F]](Ok(CheckFileRoutes.convert(items)))
|
||||
} yield resp).fold(identity, identity)
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -13,6 +13,7 @@ CURL_CMD="curl"
|
||||
INOTIFY_CMD="inotifywait"
|
||||
SHA256_CMD="sha256sum"
|
||||
MKTEMP_CMD="mktemp"
|
||||
CURL_OPTS=${CURL_OPTS:-}
|
||||
|
||||
! getopt --test > /dev/null
|
||||
if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
|
||||
@ -20,8 +21,8 @@ if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
OPTIONS=omhdp:vr
|
||||
LONGOPTS=once,distinct,help,delete,path:,verbose,recursive,dry
|
||||
OPTIONS=omhdp:vrmi
|
||||
LONGOPTS=once,distinct,help,delete,path:,verbose,recursive,dry,integration,iuser:,iheader:
|
||||
|
||||
! PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTS --name "$0" -- "$@")
|
||||
if [[ ${PIPESTATUS[0]} -ne 0 ]]; then
|
||||
@ -35,6 +36,7 @@ eval set -- "$PARSED"
|
||||
|
||||
declare -a watchdir
|
||||
help=n verbose=n delete=n once=n distinct=n recursive=n dryrun=n
|
||||
integration=n iuser="" iheader=""
|
||||
while true; do
|
||||
case "$1" in
|
||||
-h|--help)
|
||||
@ -69,6 +71,19 @@ while true; do
|
||||
dryrun=y
|
||||
shift
|
||||
;;
|
||||
-i|--integration)
|
||||
integration=y
|
||||
recursive=y
|
||||
shift
|
||||
;;
|
||||
--iuser)
|
||||
iuser="$2"
|
||||
shift 2
|
||||
;;
|
||||
--iheader)
|
||||
iheader="$2"
|
||||
shift 2
|
||||
;;
|
||||
--)
|
||||
shift
|
||||
break
|
||||
@ -87,14 +102,27 @@ showUsage() {
|
||||
echo "Usage: $0 [options] url url ..."
|
||||
echo
|
||||
echo "Options:"
|
||||
echo " -v | --verbose Print more to stdout. (value: $verbose)"
|
||||
echo " -d | --delete Delete the file if successfully uploaded. (value: $delete)"
|
||||
echo " -p | --path <dir> The directories to watch. This is required. (value: ${watchdir[@]})"
|
||||
echo " -h | --help Prints this help text. (value: $help)"
|
||||
echo " -m | --distinct Optional. Upload only if the file doesn't already exist. (value: $distinct)"
|
||||
echo " -o | --once Instead of watching, upload all files in that dir. (value: $once)"
|
||||
echo " -r | --recursive Traverse the directory(ies) recursively (value: $recursive)"
|
||||
echo " --dry Do a 'dry run', not uploading anything only printing to stdout (value: $dryrun)"
|
||||
echo " -v | --verbose Print more to stdout. (value: $verbose)"
|
||||
echo " -d | --delete Delete the file if successfully uploaded. (value: $delete)"
|
||||
echo " -p | --path <dir> The directories to watch. This is required. (value: ${watchdir[@]})"
|
||||
echo " -h | --help Prints this help text. (value: $help)"
|
||||
echo " -m | --distinct Optional. Upload only if the file doesn't already exist. (value: $distinct)"
|
||||
echo " -o | --once Instead of watching, upload all files in that dir. (value: $once)"
|
||||
echo " -r | --recursive Traverse the directory(ies) recursively (value: $recursive)"
|
||||
echo " -i | --integration Upload to the integration endpoint. It implies -r. This puts the script in"
|
||||
echo " a different mode, where the first subdirectory of any given starting point"
|
||||
echo " is read as the collective name. The url(s) are completed with this name in"
|
||||
echo " order to upload files to the respective collective. So each directory"
|
||||
echo " given is expected to contain one subdirectory per collective and the urls"
|
||||
echo " are expected to identify the integration endpoint, which is"
|
||||
echo " /api/v1/open/integration/item/<collective-name>. (value: $integration)"
|
||||
echo " --iheader The header name and value to use with the integration endpoint. This must be"
|
||||
echo " in form 'headername:value'. Only used if '-i' is supplied."
|
||||
echo " (value: $iheader)"
|
||||
echo " --iuser The username and password for basic auth to use with the integration"
|
||||
echo " endpoint. This must be of form 'user:pass'. Only used if '-i' is supplied."
|
||||
echo " (value: $iuser)"
|
||||
echo " --dry Do a 'dry run', not uploading anything only printing to stdout (value: $dryrun)"
|
||||
echo ""
|
||||
echo "Arguments:"
|
||||
echo " A list of URLs to upload the files to."
|
||||
@ -105,6 +133,9 @@ showUsage() {
|
||||
echo "Example: Upload all files in a directory"
|
||||
echo "$0 --path ~/Downloads -m -dv --once http://localhost:7880/api/v1/open/upload/item/abcde-12345-abcde-12345"
|
||||
echo ""
|
||||
echo "Example: Integration Endpoint"
|
||||
echo "$0 -i -iheader 'Docspell-Integration:test123' -m -p ~/Downloads/ http://localhost:7880/api/v1/open/integration/item"
|
||||
echo ""
|
||||
}
|
||||
|
||||
if [ "$help" = "y" ]; then
|
||||
@ -127,32 +158,67 @@ fi
|
||||
|
||||
trace() {
|
||||
if [ "$verbose" = "y" ]; then
|
||||
echo "$1"
|
||||
>&2 echo "$1"
|
||||
fi
|
||||
}
|
||||
|
||||
info() {
|
||||
echo $1
|
||||
>&2 echo $1
|
||||
}
|
||||
|
||||
getCollective() {
|
||||
file="$(realpath -e $1)"
|
||||
dir="$(realpath -e $2)"
|
||||
collective=${file#"$dir"}
|
||||
coll=$(echo $collective | cut -d'/' -f1)
|
||||
if [ -z "$coll" ]; then
|
||||
coll=$(echo $collective | cut -d'/' -f2)
|
||||
fi
|
||||
echo $coll
|
||||
}
|
||||
|
||||
|
||||
upload() {
|
||||
dir="$(realpath -e $1)"
|
||||
file="$(realpath -e $2)"
|
||||
url="$3"
|
||||
OPTS="$CURL_OPTS"
|
||||
if [ "$integration" = "y" ]; then
|
||||
collective=$(getCollective "$file" "$dir")
|
||||
trace "- upload: collective = $collective"
|
||||
url="$url/$collective"
|
||||
if [ $iuser ]; then
|
||||
OPTS="$OPTS --user $iuser"
|
||||
fi
|
||||
if [ $iheader ]; then
|
||||
OPTS="$OPTS -H $iheader"
|
||||
fi
|
||||
fi
|
||||
if [ "$dryrun" = "y" ]; then
|
||||
info "Not uploading (dry-run) $1 to $2"
|
||||
info "- Not uploading (dry-run) $file to $url with opts $OPTS"
|
||||
else
|
||||
tf=$($MKTEMP_CMD) rc=0
|
||||
$CURL_CMD -# -o "$tf" --stderr "$tf" -w "%{http_code}" -XPOST -F file=@"$1" "$2" | (2>&1 1>/dev/null grep 200)
|
||||
rc=$(expr $rc + $?)
|
||||
cat $tf | (2>&1 1>/dev/null grep '{"success":true')
|
||||
rc=$(expr $rc + $?)
|
||||
if [ $rc -ne 0 ]; then
|
||||
trace "- Uploading $file to $url with options $OPTS"
|
||||
tf1=$($MKTEMP_CMD) tf2=$($MKTEMP_CMD) rc=0
|
||||
$CURL_CMD --fail -# -o "$tf1" --stderr "$tf2" $OPTS -XPOST -F file=@"$file" "$url"
|
||||
if [ $? -ne 0 ]; then
|
||||
info "Upload failed. Exit code: $rc"
|
||||
cat "$tf"
|
||||
cat "$tf1"
|
||||
cat "$tf2"
|
||||
echo ""
|
||||
rm "$tf"
|
||||
rm "$tf1" "$tf2"
|
||||
return $rc
|
||||
else
|
||||
rm "$tf"
|
||||
return 0
|
||||
if cat $tf1 | grep -q '{"success":false'; then
|
||||
echo "Upload failed. Message from server:"
|
||||
cat "$tf1"
|
||||
echo ""
|
||||
rm "$tf1" "$tf2"
|
||||
return 1
|
||||
else
|
||||
info "- Upload done."
|
||||
rm "$tf1" "$tf2"
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
}
|
||||
@ -162,28 +228,69 @@ checksum() {
|
||||
}
|
||||
|
||||
checkFile() {
|
||||
local url=$(echo "$1" | sed 's,upload/item,checkfile,g')
|
||||
local url="$1"
|
||||
local file="$2"
|
||||
trace "Check file: $url/$(checksum "$file")"
|
||||
$CURL_CMD -XGET -s "$url/$(checksum "$file")" | (2>&1 1>/dev/null grep '"exists":true')
|
||||
local dir="$3"
|
||||
OPTS="$CURL_OPTS"
|
||||
if [ "$integration" = "y" ]; then
|
||||
collective=$(getCollective "$file" "$dir")
|
||||
url="$url/$collective"
|
||||
url=$(echo "$url" | sed 's,/item/,/checkfile/,g')
|
||||
if [ $iuser ]; then
|
||||
OPTS="$OPTS --user $iuser"
|
||||
fi
|
||||
if [ $iheader ]; then
|
||||
OPTS="$OPTS -H $iheader"
|
||||
fi
|
||||
else
|
||||
url=$(echo "$1" | sed 's,upload/item,checkfile,g')
|
||||
fi
|
||||
trace "- Check file: $url/$(checksum $file)"
|
||||
tf1=$($MKTEMP_CMD) tf2=$($MKTEMP_CMD)
|
||||
|
||||
$CURL_CMD --fail -o "$tf1" --stderr "$tf2" $OPTS -XGET -s "$url/$(checksum "$file")"
|
||||
if [ $? -ne 0 ]; then
|
||||
info "Checking file failed!"
|
||||
cat "$tf1" >&2
|
||||
cat "$tf2" >&2
|
||||
info ""
|
||||
rm "$tf1" "$tf2"
|
||||
echo "failed"
|
||||
return 1
|
||||
else
|
||||
if cat "$tf1" | grep -q '{"exists":true'; then
|
||||
rm "$tf1" "$tf2"
|
||||
echo "y"
|
||||
else
|
||||
rm "$tf1" "$tf2"
|
||||
echo "n"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
process() {
|
||||
file="$1"
|
||||
file="$(realpath -e $1)"
|
||||
dir="$2"
|
||||
info "---- Processing $file ----------"
|
||||
declare -i curlrc=0
|
||||
set +e
|
||||
for url in $urls; do
|
||||
if [ "$distinct" = "y" ]; then
|
||||
trace "- Checking if $file has been uploaded to $url already"
|
||||
checkFile "$url" "$file"
|
||||
if [ $? -eq 0 ]; then
|
||||
res=$(checkFile "$url" "$file" "$dir")
|
||||
rc=$?
|
||||
curlrc=$(expr $curlrc + $rc)
|
||||
trace "- Result from checkfile: $res"
|
||||
if [ "$res" = "y" ]; then
|
||||
info "- Skipping file '$file' because it has been uploaded in the past."
|
||||
continue
|
||||
elif [ "$res" != "n" ]; then
|
||||
info "- Checking file failed, skipping the file."
|
||||
continue
|
||||
fi
|
||||
fi
|
||||
trace "- Uploading '$file' to '$url'."
|
||||
upload "$file" "$url"
|
||||
upload "$dir" "$file" "$url"
|
||||
rc=$?
|
||||
curlrc=$(expr $curlrc + $rc)
|
||||
if [ $rc -ne 0 ]; then
|
||||
@ -207,6 +314,16 @@ process() {
|
||||
fi
|
||||
}
|
||||
|
||||
findDir() {
|
||||
path="$1"
|
||||
for dir in "${watchdir[@]}"; do
|
||||
if [[ $path = ${dir}* ]]
|
||||
then
|
||||
echo $dir
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
if [ "$once" = "y" ]; then
|
||||
info "Uploading all files in '$watchdir'."
|
||||
MD="-maxdepth 1"
|
||||
@ -215,7 +332,7 @@ if [ "$once" = "y" ]; then
|
||||
fi
|
||||
for dir in "${watchdir[@]}"; do
|
||||
find "$dir" $MD -type f -print0 | while IFS= read -d '' -r file; do
|
||||
process "$file"
|
||||
process "$file" "$dir"
|
||||
done
|
||||
done
|
||||
else
|
||||
@ -225,8 +342,9 @@ else
|
||||
fi
|
||||
$INOTIFY_CMD $REC -m "${watchdir[@]}" -e close_write -e moved_to |
|
||||
while read path action file; do
|
||||
trace "The file '$file' appeared in directory '$path' via '$action'"
|
||||
dir=$(findDir "$path")
|
||||
trace "The file '$file' appeared in directory '$path' below '$dir' via '$action'"
|
||||
sleep 1
|
||||
process "$path$file"
|
||||
process "$(realpath -e "$path$file")" "$dir"
|
||||
done
|
||||
fi
|
||||
|
Loading…
x
Reference in New Issue
Block a user