2022-05-16 13:01:28 +00:00
|
|
|
+++
|
|
|
|
title = "Addon for audio file support"
|
|
|
|
[extra]
|
|
|
|
author = "eikek"
|
|
|
|
+++
|
|
|
|
|
2022-05-22 11:10:00 +00:00
|
|
|
# Audio file support
|
2022-05-16 13:01:28 +00:00
|
|
|
|
|
|
|
Since version 0.36.0 Docspell can be extended by
|
|
|
|
[addons](@/docs/addons/basics.md) - external programs that are
|
|
|
|
executed at some defined point in Docspell. This is a walk through the
|
|
|
|
first addon that was created, mainly as an example: providing support
|
|
|
|
for audio files.
|
|
|
|
|
|
|
|
<!-- more -->
|
|
|
|
|
|
|
|
I think it is interesting to provide support for audio files for a
|
|
|
|
DMS, although admittedly I don't have much of a use :). But this is
|
|
|
|
the kind of use-case that addons are for.
|
|
|
|
|
|
|
|
# The idea
|
|
|
|
|
|
|
|
The idea is very simple: the real work is done by external programs,
|
|
|
|
most notably [coqui's stt](https://github.com/coqui-ai/STT) a deep
|
|
|
|
learning toolkit originally created at Mozilla. It provides a command
|
|
|
|
line tool that accepts a WAV file and spits out text. Perfect!
|
|
|
|
|
|
|
|
With this text, a PDF file can be created and a preview image which is
|
|
|
|
already enough for basic support. You can see the pdf in the web-ui
|
|
|
|
and search for the text via SOLR or PostgreSQL.
|
|
|
|
|
|
|
|
Because a WAV file is not the most popular format today, `ffmpeg` can
|
|
|
|
be used to transform any other audio to WAV.
|
|
|
|
|
|
|
|
The only thing now is to create a program that checks the uploaded
|
|
|
|
files, filters out all audio files and runs them through the mentioned
|
|
|
|
programs. So let's do this.
|
|
|
|
|
|
|
|
# Preparation
|
|
|
|
|
|
|
|
Addons are external programs and can be written in whatever language….
|
|
|
|
For me this is a good opportunity to refresh my rusty scheme know-how
|
|
|
|
a bit. So this addon is written in Scheme, in particular
|
|
|
|
[guile](https://www.gnu.org/software/guile/). Programming in scheme is
|
|
|
|
fun and guile provides good integration into the (posix) OS and also
|
|
|
|
has a nice JSON module. I had the [reference
|
|
|
|
docs](https://www.gnu.org/software/guile/docs/docs-2.2/guile-ref/index.html)
|
|
|
|
open all the time - look at them for further details on the used
|
|
|
|
functions.
|
|
|
|
|
|
|
|
It's usually good to play around with the tools at first. For stt, we
|
|
|
|
first need to download a *model*. This will be used to "detect" the
|
|
|
|
text in the audio data. They have a [page](https://coqui.ai/models)
|
|
|
|
where we can download model files for any supported language. For the
|
|
|
|
addon, we will implement English and German.
|
|
|
|
|
|
|
|
When creating a PDF with wkhtmltopdf, we prettify it a little by
|
|
|
|
embedding the plain text into some html template. This will also take
|
|
|
|
care to specifiy UTF-8 as default encoding directly in the HTML
|
|
|
|
template.
|
|
|
|
|
|
|
|
FFMpeg just works as usual. It figures out the input format
|
|
|
|
automatically and knows from the extension of the output file what to
|
|
|
|
do.
|
|
|
|
|
|
|
|
You can find the full code
|
|
|
|
[here](https://github.com/docspell/audio-files-addon/blob/master/src/addon.scm).
|
|
|
|
The following shows excerpts from it with some explanation.
|
|
|
|
|
|
|
|
# The script
|
|
|
|
|
|
|
|
## Helpers
|
|
|
|
|
|
|
|
After the preamble, there are two helper functions.
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define* (errln formatstr . args)
|
|
|
|
(apply format (current-error-port) formatstr args)
|
|
|
|
(newline))
|
|
|
|
|
|
|
|
;; Macro for executing system commands and making this program exit in
|
|
|
|
;; case of failure.
|
|
|
|
(define-syntax sysexec
|
|
|
|
(syntax-rules ()
|
|
|
|
((sysexec exp ...)
|
|
|
|
(let ((rc (apply system* (list exp ...))))
|
|
|
|
(unless (eqv? rc EXIT_SUCCESS)
|
|
|
|
(format (current-error-port) "> '~a …' failed with: ~#*~:*~d~%" exp ... rc)
|
|
|
|
(exit 1))
|
|
|
|
#t))))
|
|
|
|
```
|
|
|
|
|
|
|
|
As this addon wants to pass data back to Docspell via stdout, we use
|
|
|
|
the stderr for logging and printing general information. The function
|
|
|
|
`errln` (short for "error line" :)) allows to conveniently print to
|
|
|
|
stderr and the second wraps the `system*` procedure such that the
|
|
|
|
script fails whenever the external program fails. It is somewhat
|
|
|
|
similar to `set -e` in bash.
|
|
|
|
|
|
|
|
## Dependencies
|
|
|
|
|
|
|
|
Next is the declaration of external dependencies. At first all
|
|
|
|
external programs are listed. This is important for later, when the
|
|
|
|
script is packaged via nix. Nix will substitute these commands with
|
|
|
|
absolute paths. Then it's good to not have them scattered around.
|
|
|
|
|
|
|
|
It also reads in the expected environment variables (only those we
|
|
|
|
need) that are provided by Docspell. Since this addon only makes sense
|
|
|
|
to work on an item, it quits early should some env vars are missing.
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define *curl* "curl")
|
|
|
|
(define *ffmpeg* "ffmpeg")
|
|
|
|
(define *stt* "stt")
|
|
|
|
(define *wkhtmltopdf* "wkhtmltopdf")
|
|
|
|
|
|
|
|
;; Getting some environment variables
|
|
|
|
(define *output-dir* (getenv "OUTPUT_DIR"))
|
|
|
|
(define *tmp-dir* (getenv "TMP_DIR"))
|
|
|
|
(define *cache-dir* (getenv "CACHE_DIR"))
|
|
|
|
|
|
|
|
(define *item-data-json* (getenv "ITEM_DATA_JSON"))
|
|
|
|
(define *original-files-json* (getenv "ITEM_ORIGINAL_JSON"))
|
|
|
|
(define *original-files-dir* (getenv "ITEM_ORIGINAL_DIR"))
|
|
|
|
|
|
|
|
;; fail early if not in the right context
|
|
|
|
(when (not *item-data-json*)
|
|
|
|
(errln "No item data json file found.")
|
|
|
|
(exit 1))
|
|
|
|
```
|
|
|
|
|
|
|
|
## Input/Output
|
|
|
|
|
|
|
|
The input and output schemas can be defined now. This uses the
|
|
|
|
[guile-json](https://github.com/aconchillo/guile-json) module. It
|
|
|
|
provides very convenient features for reading and writing json.
|
|
|
|
|
|
|
|
It is possible to define a record via `define-json-type` that
|
|
|
|
generates readers and writers to/from JSON. For example, the record
|
|
|
|
`<itemdata>` is defined to be an object with only one field `id`. The
|
|
|
|
function `json->scm` reads in json into scheme datastructures and then
|
|
|
|
the generated function `scm->itemdata` creates the record from it. For
|
|
|
|
every record, accessor functions exists. For example: `(itemdata-id
|
|
|
|
data)` would lookup the field `id` in the given itemdata record
|
|
|
|
`data`.
|
|
|
|
|
|
|
|
Here we need it to get the item-id and the list of file properties
|
|
|
|
belonging to the original uploaded files.
|
|
|
|
|
|
|
|
Another interesting definition is the `<output>` record. This captures
|
|
|
|
(a subset of) the schema of what Docspell receives from this addon as
|
|
|
|
a result. A full example of this data is
|
|
|
|
[here](@/docs/addons/writing.md#output). We don't need `commands` or
|
|
|
|
`newItems`, so this schema only cares about the `files` attribute.
|
|
|
|
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define-json-type <itemdata>
|
|
|
|
(id))
|
|
|
|
|
|
|
|
;; The array of original files
|
|
|
|
(define-json-type <original-file>
|
|
|
|
(id)
|
|
|
|
(name)
|
|
|
|
(position)
|
|
|
|
(language)
|
|
|
|
(mimetype)
|
|
|
|
(length)
|
|
|
|
(checksum))
|
|
|
|
|
|
|
|
;; The output record, what is returned to docspell
|
|
|
|
(define-json-type <itemfiles>
|
|
|
|
(itemId)
|
|
|
|
(textFiles)
|
|
|
|
(pdfFiles))
|
|
|
|
(define-json-type <output>
|
|
|
|
(files "files" #(<itemfiles>)))
|
|
|
|
|
|
|
|
;; Parses the JSON containing the item information
|
|
|
|
(define *itemdata-json*
|
|
|
|
(scm->itemdata (call-with-input-file *item-data-json* json->scm)))
|
|
|
|
|
|
|
|
;; The JSON file containing meta data for all source files as vector.
|
|
|
|
(define *original-meta-json*
|
|
|
|
(let ((props (vector->list (call-with-input-file *original-files-json* json->scm))))
|
|
|
|
(map scm->original-file props)))
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Finding the audio file
|
|
|
|
|
|
|
|
The previously parsed json array `*original-meta-json*` can now be
|
|
|
|
used to find any audio files within the original uploaded files, as
|
|
|
|
done in `find-audio-files`. It simply goes through the list and keeps
|
|
|
|
those files whose mimetype starts with `audio/`. The mimetype is
|
|
|
|
provided by Docspell in the file properties in `ITEM_ORIGINAL_JSON`.
|
|
|
|
|
|
|
|
Before converting to wav with ffmpeg, it is quickly checked if it's
|
|
|
|
not a wav already.
|
|
|
|
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define (is-wav? mime)
|
|
|
|
"Test whether the mimetype MIME is denoting a wav file."
|
|
|
|
(or (string-suffix? "/wav" mime)
|
|
|
|
(string-suffix? "/x-wav" mime)
|
|
|
|
(string-suffix? "/vnd.wav" mime)))
|
|
|
|
|
|
|
|
(define (find-audio-files)
|
|
|
|
"Find all source files that are audio files."
|
|
|
|
(filter! (lambda (el)
|
|
|
|
(string-prefix?
|
|
|
|
"audio/"
|
|
|
|
(original-file-mimetype el)))
|
|
|
|
*original-meta-json*))
|
|
|
|
|
|
|
|
(define (convert-wav id mime)
|
|
|
|
"Run ffmpeg to convert to wav."
|
|
|
|
(let ((src-file (format #f "~a/~a" *original-files-dir* id))
|
|
|
|
(out-file (format #f "~a/in.wav" *tmp-dir*)))
|
|
|
|
(if (is-wav? mime)
|
|
|
|
src-file
|
|
|
|
(begin
|
|
|
|
(errln "Running ffmpeg to convert wav file...")
|
|
|
|
(sysexec *ffmpeg* "-loglevel" "error" "-y" "-i" src-file out-file)
|
|
|
|
out-file))))
|
|
|
|
```
|
|
|
|
|
|
|
|
## Speech to text
|
|
|
|
|
|
|
|
Once we have a wav file, we can run speech-to-text recognition on it.
|
|
|
|
As said above, we need to download a model first, which is depending
|
|
|
|
on a language. Luckily, Docspell provides the language of the file.
|
|
|
|
This is the lanugage either given directly by the user when uploading
|
|
|
|
or it's the collective's default language.
|
|
|
|
|
|
|
|
In the following snippet, we get the language as arguments. We will
|
|
|
|
get it later from the file properties.
|
|
|
|
|
|
|
|
As seen below, the model file is stored to the `CACHE_DIR`. This is
|
|
|
|
provided by Docspell and will survive the execution of this script.
|
|
|
|
All other directories involved will be deleted eventually. The
|
|
|
|
`CACHE_DIR` is the place to store intermediate results you don't want
|
|
|
|
to loose between addon runs. But as any cache, it may not exist the
|
|
|
|
next time the addon is run. Docspell doesn't clear it automatically,
|
|
|
|
though.
|
|
|
|
|
|
|
|
The last function simply executes the `stt` external command and puts
|
|
|
|
stdout into a file.
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define (get-model language)
|
|
|
|
(let* ((lang (or language "eng"))
|
|
|
|
(file (format #f "~a/model_~a.pbmm" *cache-dir* lang)))
|
|
|
|
(unless (file-exists? file)
|
|
|
|
(download-model lang file))
|
|
|
|
file))
|
|
|
|
|
|
|
|
(define (download-model lang file)
|
|
|
|
"Download model files per language. Nix has currently stt 0.9.3 packaged."
|
|
|
|
(let ((url (cond
|
|
|
|
((string= lang "eng") "https://coqui.gateway.scarf.sh/english/coqui/v0.9.3/model.pbmm")
|
|
|
|
((string= lang "deu") "https://coqui.gateway.scarf.sh/german/AASHISHAG/v0.9.0/model.pbmm")
|
|
|
|
(else (error "Unsupported language: " lang)))))
|
|
|
|
(errln "Downloading model file for language: ~a" lang)
|
|
|
|
(sysexec *curl* "-SsL" "-o" file url)
|
|
|
|
file))
|
|
|
|
|
|
|
|
(define (extract-text model input out)
|
|
|
|
"Runs stt for speech-to-text and writes the text into the file OUT."
|
|
|
|
(errln "Extracting text from audio…")
|
|
|
|
(with-output-to-file out
|
|
|
|
(lambda ()
|
|
|
|
(sysexec *stt* "--model" model "--audio" input))))
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Create PDF
|
|
|
|
|
|
|
|
Creating the PDF is straight forward. The extracted text is embedded
|
|
|
|
into a HTML file which is then passed to `wkhtmltopdf`. Since we don't
|
|
|
|
need this file for anything else, it is stored to the `TMP_DIR`.
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define (create-pdf txt-file out)
|
|
|
|
(define (line str)
|
|
|
|
(format #t "~a\n" str))
|
|
|
|
(errln "Creating pdf file…")
|
|
|
|
(let ((tmphtml (format #f "~a/text.html" *tmp-dir*)))
|
|
|
|
(with-output-to-file tmphtml
|
|
|
|
(lambda ()
|
|
|
|
(line "<!DOCTYPE html>")
|
|
|
|
(line "<html>")
|
|
|
|
(line " <head><meta charset=\"UTF-8\"></head>")
|
|
|
|
(line " <body style=\"padding: 2em; font-size: large;\">")
|
|
|
|
(line " <div style=\"padding: 0.5em; font-size:normal; font-weight: bold; border: 1px solid black;\">")
|
|
|
|
(line " Extracted from audio using stt on ")
|
|
|
|
(display (strftime "%c" (localtime (current-time))))
|
|
|
|
(line " </div>")
|
|
|
|
(line " <p>")
|
|
|
|
(display (call-with-input-file txt-file read-string))
|
|
|
|
(line " </p>")
|
|
|
|
(line "</body></html>")))
|
|
|
|
(sysexec *wkhtmltopdf* tmphtml out)))
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Putting it together
|
|
|
|
|
|
|
|
The main function now puts everything together. The `process-file`
|
|
|
|
function is called for every file that is returned from
|
|
|
|
`(find-audio-files)`. It will extract the necessary information (like
|
|
|
|
the language) from the json document via record accessors (e.g.
|
|
|
|
`original-file-lanugage file)`) and then calls the functions defined
|
|
|
|
above. At last it creates a `<itemfile>` record with `make-itemfiles`.
|
|
|
|
|
|
|
|
An `<itemfile>` record contains now the important information for
|
|
|
|
Docspell. It requires the item-id and a mapping from attachment-ids to
|
|
|
|
files in `OUTPUT_DIR`. For each attachment identified by its ID,
|
|
|
|
Docspell replaces the extracted text with the contents of the given
|
|
|
|
file and replaces the converted PDF file, respectively. In the code
|
|
|
|
below, two lists of such mappings are defined - the first for the text
|
|
|
|
files, the second for the converted pdf. The files must be specified
|
|
|
|
relative to `OUTPUT_DIR`.
|
|
|
|
|
|
|
|
That means `process-all` returns a list of `<itemfile>` records which
|
|
|
|
is then used to create the `<output>` record. And finally, a
|
|
|
|
`output->json` function will turn the record into proper JSON which is
|
|
|
|
send to stdout.
|
|
|
|
|
|
|
|
```lisp
|
|
|
|
(define (process-file itemid file)
|
|
|
|
"Processing a single audio file."
|
|
|
|
(let* ((id (original-file-id file))
|
|
|
|
(mime (original-file-mimetype file))
|
|
|
|
(lang (original-file-language file))
|
|
|
|
(txt-file (format #f "~a/~a.txt" *output-dir* id))
|
|
|
|
(pdf-file (format #f "~a/~a.pdf" *output-dir* id))
|
|
|
|
(wav (convert-wav id mime))
|
|
|
|
(model (get-model lang)))
|
|
|
|
(extract-text model wav txt-file)
|
|
|
|
(create-pdf txt-file pdf-file)
|
|
|
|
(make-itemfiles itemid
|
|
|
|
`((,id . ,(format #f "~a.txt" id)))
|
|
|
|
`((,id . ,(format #f "~a.pdf" id))))))
|
|
|
|
|
|
|
|
(define (process-all)
|
|
|
|
(let ((item-id (itemdata-id *itemdata-json*)))
|
|
|
|
(map (lambda (file)
|
|
|
|
(process-file item-id file))
|
|
|
|
(find-audio-files))))
|
|
|
|
|
|
|
|
(define (main args)
|
|
|
|
(let ((out (make-output (process-all))))
|
|
|
|
(format #t "~a" (output->json out))))
|
|
|
|
```
|
|
|
|
|
|
|
|
Example output:
|
|
|
|
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"files": [
|
|
|
|
{
|
|
|
|
"itemId":"qZDnyGIAJsXr",
|
|
|
|
"textFiles": { "HPFvIDib6eA": "HPFvIDib6eA.txt" },
|
|
|
|
"pdfFiles": { "HPFvIDib6eA": "HPFvIDib6eA.pdf"}
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
# Packaging
|
|
|
|
|
|
|
|
Now with that script some additional plumbing is needed to make it an
|
|
|
|
"Addon" for Docspell.
|
|
|
|
|
|
|
|
The external tools - stt, ffmpeg, curl and wkhtmltopdf are required as
|
|
|
|
well as guile to compile and interpret the script. Also the guile-json
|
|
|
|
module must be installed.
|
|
|
|
|
|
|
|
This can turn into a quite tedious task. Luckily, there is
|
|
|
|
[nix](https://nixos.org) that has an answer to this. A user who wants
|
|
|
|
to use this script only needs to install nix. This package manager
|
|
|
|
then takes care of providing the exact dependencies we need (down to
|
|
|
|
the correct version and including guile as the language and runtime).
|
|
|
|
|
|
|
|
## A flake
|
|
|
|
|
|
|
|
Everything is defined in the `flake.nix` in the source root. It looks
|
|
|
|
like this:
|
|
|
|
|
|
|
|
```nix
|
|
|
|
{
|
|
|
|
description = "A docspell addon for basic audio file support";
|
|
|
|
|
|
|
|
inputs = {
|
|
|
|
utils.url = "github:numtide/flake-utils";
|
|
|
|
|
|
|
|
# Nixpkgs / NixOS version to use.
|
|
|
|
nixpkgs.url = "nixpkgs/nixos-21.11";
|
|
|
|
};
|
|
|
|
|
|
|
|
outputs = { self, nixpkgs, utils }:
|
|
|
|
utils.lib.eachSystem ["x86_64-linux"] (system:
|
|
|
|
let
|
|
|
|
pkgs = import nixpkgs {
|
|
|
|
inherit system;
|
|
|
|
overlays = [
|
|
|
|
|
|
|
|
];
|
|
|
|
};
|
|
|
|
name = "audio-files-addon";
|
|
|
|
in rec {
|
|
|
|
packages.${name} = pkgs.callPackage ./nix/addon.nix {
|
|
|
|
inherit name;
|
|
|
|
};
|
|
|
|
|
|
|
|
defaultPackage = packages.${name};
|
|
|
|
|
|
|
|
apps.${name} = utils.lib.mkApp {
|
|
|
|
inherit name;
|
|
|
|
drv = packages.${name};
|
|
|
|
};
|
|
|
|
defaultApp = apps.${name};
|
|
|
|
|
|
|
|
## … omitted for brevity
|
|
|
|
}
|
|
|
|
);
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
First sad thing is, that only `x86_64` systems are supported. This is
|
|
|
|
due to `stt` not being available on other platforms currently (as
|
|
|
|
provided by nixpkgs).
|
|
|
|
|
|
|
|
The rest is a bit magic: A package and "defaultPackage" is defined
|
|
|
|
with a reference to `nix/addon.nix`. The important part is the line
|
|
|
|
|
|
|
|
```nix
|
|
|
|
inputs = {
|
|
|
|
# Nixpkgs / NixOS version to use.
|
|
|
|
nixpkgs.url = "nixpkgs/nixos-21.11";
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
|
|
|
It says that as input for "building" the script, we take all of
|
|
|
|
[nixpkgs](https://github.com/NixOS/nixpkgs) which is a package
|
|
|
|
collection defined for (and in) nix - including thousands of software
|
|
|
|
packages. We can pick and choose from these. No surprise, all external
|
|
|
|
tools we need are included!
|
|
|
|
|
|
|
|
A flake defines the inputs and outputs of a package. With all of
|
|
|
|
nixpkgs as inputs, we can create a definition to elevate this script
|
|
|
|
into a *package*.
|
|
|
|
|
|
|
|
## Package definition
|
|
|
|
|
|
|
|
The definition for "building" the script is in `nix/addon.nix`:
|
|
|
|
|
|
|
|
```nix
|
|
|
|
{ stdenv, bash, cacert, curl, stt, wkhtmltopdf, ffmpeg, guile, guile-json, lib, name }:
|
|
|
|
|
|
|
|
stdenv.mkDerivation {
|
|
|
|
inherit name;
|
|
|
|
src = lib.sources.cleanSource ../.;
|
|
|
|
|
|
|
|
buildInputs = [ guile guile-json ];
|
|
|
|
|
|
|
|
patchPhase = ''
|
|
|
|
TARGET=src/addon.scm
|
|
|
|
sed -i 's,\*curl\* "curl",\*curl\* "${curl}/bin/curl",g' $TARGET
|
|
|
|
sed -i 's,\*ffmpeg\* "ffmpeg",\*ffmpeg\* "${ffmpeg}/bin/ffmpeg",g' $TARGET
|
|
|
|
sed -i 's,\*stt\* "stt",\*stt\* "${stt}/bin/stt",g' $TARGET
|
|
|
|
sed -i 's,\*wkhtmltopdf\* "wkhtmltopdf",\*wkhtmltopdf\* "${wkhtmltopdf}/bin/wkhtmltopdf",g' $TARGET
|
|
|
|
'';
|
|
|
|
|
|
|
|
buildPhase = ''
|
|
|
|
guild compile -o ${name}.go src/addon.scm
|
|
|
|
'';
|
|
|
|
|
|
|
|
# module name must be same as <filename>.go
|
|
|
|
installPhase = ''
|
|
|
|
mkdir -p $out/{bin,lib}
|
|
|
|
cp ${name}.go $out/lib/
|
|
|
|
|
|
|
|
cat > $out/bin/${name} <<-EOF
|
|
|
|
#!${bash}/bin/bash
|
|
|
|
export SSL_CERT_FILE="${cacert}/etc/ssl/certs/ca-bundle.crt"
|
|
|
|
exec -a "${name}" ${guile}/bin/guile -C ${guile-json}/share/guile/ccache -C $out/lib -e '(${name}) main' -c "" \$@
|
|
|
|
EOF
|
|
|
|
chmod +x $out/bin/${name}
|
|
|
|
'';
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
With a bit of handwaving - this is a bash script that modifies
|
|
|
|
slightly the scheme script and runs a compile on it. We simply declare
|
|
|
|
all packages we need in the first line of `{ … }` - these are
|
|
|
|
arguments that are automatically filled by nix by searching the
|
|
|
|
corresponding package in nixpkgs.
|
|
|
|
|
|
|
|
First the `patchPhase` is executed. It will replace the variables
|
|
|
|
containing the external tools with an absolute path to the version
|
|
|
|
that we currently get from nixpkgs. With this step nix takes care that
|
|
|
|
all these packages are available *at runtime* when executing the
|
|
|
|
script. All versions are finally fixed in `flake.lock` and can be
|
|
|
|
upgraded manually.
|
|
|
|
|
|
|
|
The `buildPhase` runs the guile compiler that produces some
|
|
|
|
intermediate code that will be loaded instead of compiling the script
|
|
|
|
on-the-fly.
|
|
|
|
|
|
|
|
At last, `installPhase` creates a wrapper script that runs guile with
|
|
|
|
the correct load-path pointing to `guile-json` and to our pre-compiled
|
|
|
|
script. Additionally, trusted root certificates are exported to make
|
|
|
|
the curl commands work. This script will be created in `$out`
|
|
|
|
directory that is provided by nix.
|
|
|
|
|
|
|
|
If you now run `nix build` in the source root, it will execute all
|
|
|
|
these phases and produce a symlink pointing to the result. You can
|
|
|
|
then `cat` the resulting file if you are curious.
|
|
|
|
|
|
|
|
This way the script is completely isolated from the system it runs
|
|
|
|
on - as long as the nix package manager is available. It includes all
|
|
|
|
the external tools, as well as the underlying runtime (guile)! The
|
|
|
|
result is a tiny wrapper bash script that can be run "everywhere"
|
|
|
|
(modulo all the restrictions, like non-x86_64 platforms, of course
|
|
|
|
:)).
|
|
|
|
|
|
|
|
|
|
|
|
## Addon Descriptor
|
|
|
|
|
|
|
|
At last, a small yaml file is needed to tell Docspell a little about
|
|
|
|
the addon.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
meta:
|
|
|
|
name: "audio-files-addon"
|
|
|
|
version: "0.1.0"
|
|
|
|
description: |
|
|
|
|
This addon adds support for audio files. Audio files are processed
|
|
|
|
by a speech-to-text engine and a pdf is generated.
|
|
|
|
|
|
|
|
It doesn't expect any user arguments at the moment. It requires
|
|
|
|
internet access to download model files.
|
|
|
|
|
|
|
|
triggers:
|
|
|
|
- final-process-item
|
|
|
|
- final-reprocess-item
|
|
|
|
- existing-item
|
|
|
|
|
|
|
|
runner:
|
|
|
|
nix:
|
|
|
|
enable: true
|
|
|
|
|
|
|
|
docker:
|
|
|
|
enable: false
|
|
|
|
|
|
|
|
trivial:
|
|
|
|
enable: true
|
|
|
|
exec: src/addon.scm
|
|
|
|
|
|
|
|
options:
|
|
|
|
networking: true
|
|
|
|
collectOutput: true
|
|
|
|
```
|
|
|
|
|
|
|
|
This tells Docspell via `triggers` when this addon may be run. This
|
|
|
|
one only makes sense for an item. Thus it can be hooked up to run with
|
|
|
|
every file-processing job or a user can manually trigger it on an
|
|
|
|
item.
|
|
|
|
|
|
|
|
It also tells via `runner:` that it can be build and run via nix, but
|
|
|
|
not via docker (I gave up after an hour to create a Dockerfile…). It
|
|
|
|
could also be run "as-is" but the user then needs to install all these
|
|
|
|
tools and guile manually.
|
|
|
|
|
|
|
|
# Done
|
|
|
|
|
|
|
|
That's it. You can install this addon in Docspell and create a run
|
|
|
|
configuration to let it execute when you want.
|