Add some docs to file backends

This commit is contained in:
eikek 2022-03-12 01:20:06 +01:00
parent 290b4ca58b
commit b71085761b
3 changed files with 179 additions and 5 deletions

View File

@ -8,21 +8,20 @@ package docspell.joex.hk
import cats.effect._
import cats.implicits._
import docspell.common._
import docspell.joex.Config
import docspell.joex.scheduler.Task
import docspell.store.records._
import docspell.store.usertask.UserTaskScope
import com.github.eikek.calev._
import docspell.backend.ops.OFileRepository
object HouseKeepingTask {
private val periodicId = Ident.unsafe("docspell-houskeeping")
val taskName: Ident = Ident.unsafe("housekeeping")
def apply[F[_]: Async](cfg: Config): Task[F, Unit, Unit] =
def apply[F[_]: Async](cfg: Config, fileRepo: OFileRepository[F]): Task[F, Unit, Unit] =
Task
.log[F, Unit](_.info(s"Running house-keeping task now"))
.flatMap(_ => CleanupInvitesTask(cfg.houseKeeping.cleanupInvites))

View File

@ -160,6 +160,22 @@ enabled by providing a secret:
This secret must be provided to all requests to a `/api/v1/admin/`
endpoint.
The most convenient way to execute admin tasks is to use the
[cli](@/docs/tools/cli.md). You get a list of possible admin commands
via `dsc admin help`.
To see the output of the commands, there are these ways:
1. looking at the joex logs, which gives most details.
2. Use the job-queue page when logged in as `docspell-system`
3. setup a [webhook](@/docs/webapp/notification.md) to be notified
when a job finishes. This way you get a small message.
All admin tasks (and also some other system tasks) are run under the
account `docspell-system` (collective and user). You need to create
this account and setup the notification hooks in there - not in your
normal account.
## Full-Text Search: SOLR
@ -204,6 +220,12 @@ a call:
$ curl -XPOST -H "Docspell-Admin-Secret: test123" http://localhost:7880/api/v1/admin/fts/reIndexAll
```
or use the [cli](@/docs/tools/cli.md):
```bash
dsc admin -a test123 recreate-index
```
Here the `test123` is the key defined with `admin-endpoint.secret`. If
it is empty (the default), this call is disabled (all admin routes).
Otherwise, the POST request will submit a system task that is executed
@ -445,6 +467,147 @@ If you find that these methods do not suffice for your case, please
open an issue.
## File Backends
Docspell allows to choose from different storage backends for binary
files. You can choose between:
1. *Database (the recommended default)*
The database can be used to store the files as well. It is the
default. It doesn't require any other configuration and works well
with multiple instances of restservers and joex nodes.
2. *S3*
The S3 backend allows to store files in an S3 compatible storage.
It was tested with MinIO, which is possible to self host.
3. *Filesystem*
The filesystem can also be used directly, by specifying a
directory. Be aware that _all_ nodes must have read and write
access into this directory! When running multiple nodes over a
network, consider using one of the above instead. Docspell uses a
fixed structure for storing the files below the given directory, it
cannot be configured.
When using S3 or filesystem, remember to backup the database *and* the
files!
Note that Docspell not only stores the file that are uploaded, but
also some other files for internal use.
### Configuring
{% warningbubble(title="Note") %}
Each node must have the same config for its file backend! When using
the filesystem, make sure all processes can access the directory with
read and write permissions.
{% end %}
The file storage backend can be configured inside the `files` section
(see the default configs below):
```conf
files {
default-store = "database"
stores = {
database =
{ enabled = true
type = "default-database"
}
filesystem =
{ enabled = false
type = "file-system"
directory = "/some/directory"
}
minio =
{ enabled = false
type = "s3"
endpoint = "http://localhost:9000"
access-key = "username"
secret-key = "password"
bucket = "docspell"
}
}
}
```
The `stores` object defines a set of stores and the `default-store`
selects the one that should be used. All disabled store configurations
are removed from the list. Thus the `default-store` must be enabled.
Other enabled stores can be used as the target when copying files (see
below).
A store configuration requires a `enabled` and `type` property.
Depending on the `type` property, other properties are required, they
are presented above. The available storage types are
`default-database`, `file-system` and `s3`.
If you use the docker setup, you can find the corresponding
environment variables to the above config snippet
[below](#environment-variables).
### Change Backends
It is possible to change backends with a bit of manual effort. When
doing this, please make sure that the application is not used. It is
important that no file is uploaded during the following steps.
The [cli](@/docs/tools/cli.md) will be used, please set it up first
and you need to enable the [admin endpoint](#admin-endpoint). Config
changes mentioned here must be applied to all nodes - joex and
restserver!
1. In the config, enable a second file backend (besides the default)
you want to change to and start docspell as normal. Don't change
`default-store` yet.
2. Run the file integrity check in order to see whether all files are
ok as they are in the current store. This can be done using the
[cli](@/docs/tools/cli.md) by running:
```bash
dsc admin file-integrity-check
```
3. Run the copy files admin command which will copy all files from the
current `default-store` to all other enabled stores.
```bash
dsc admin clone-file-repository
```
And wait until it's done :-). You can see the progress in the jobs
page when logged in as `docspell-system` or just look at the logs.
4. In the config, change the `default-store` to the one you just
copied all the files to and restart docspell.
5. Login and do some smoke tests. Then run the file integrity check
again:
```bash
dsc admin file-integrity-check
```
If all is fine, then you are done and are now using the new file
backend. If the second integrity check fails, please open an issue.
You need then to revert the config change of step 4 to use the
previous `default-store` again.
If you want to delete the files from the database, you can do so by
running the following SQL against the database:
```sql
DELETE FROM filechunk
```
You can copy them back into the database using the steps above.
## File Processing
Files are being processed by the joex component. So all the respective
@ -517,9 +680,14 @@ setting has significant impact, especially when your documents are in
German. Here are some rough numbers on jvm heap usage (the same file
was used for all tries):
<table class="table is-hoverable is-striped">
<table class="striped-basic">
<thead>
<tr><th>nlp.mode</th><th>English</th><th>German</th><th>French</th></tr>
<tr>
<th>nlp.mode</th>
<th>English</th>
<th>German</th>
<th>French</th>
</tr>
</thead>
<tfoot>
</tfoot>

View File

@ -3,6 +3,13 @@
@apply leading-relaxed text-left;
}
.content table.striped-basic tbody tr {
@apply border-t dark:border-stone-600;
}
.content table {
@apply w-full my-2 px-4;
}
.content h1:not(.no-default) {
@apply text-4xl font-serif font-bold mt-6 mb-3 py-1 border-b dark:border-stone-800 text-stone-700 dark:text-stone-200;
}