mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-05-28 19:19:15 +00:00
Add some docs to file backends
This commit is contained in:
parent
290b4ca58b
commit
b71085761b
@ -8,21 +8,20 @@ package docspell.joex.hk
|
||||
|
||||
import cats.effect._
|
||||
import cats.implicits._
|
||||
|
||||
import docspell.common._
|
||||
import docspell.joex.Config
|
||||
import docspell.joex.scheduler.Task
|
||||
import docspell.store.records._
|
||||
import docspell.store.usertask.UserTaskScope
|
||||
|
||||
import com.github.eikek.calev._
|
||||
import docspell.backend.ops.OFileRepository
|
||||
|
||||
object HouseKeepingTask {
|
||||
private val periodicId = Ident.unsafe("docspell-houskeeping")
|
||||
|
||||
val taskName: Ident = Ident.unsafe("housekeeping")
|
||||
|
||||
def apply[F[_]: Async](cfg: Config): Task[F, Unit, Unit] =
|
||||
def apply[F[_]: Async](cfg: Config, fileRepo: OFileRepository[F]): Task[F, Unit, Unit] =
|
||||
Task
|
||||
.log[F, Unit](_.info(s"Running house-keeping task now"))
|
||||
.flatMap(_ => CleanupInvitesTask(cfg.houseKeeping.cleanupInvites))
|
||||
|
@ -160,6 +160,22 @@ enabled by providing a secret:
|
||||
This secret must be provided to all requests to a `/api/v1/admin/`
|
||||
endpoint.
|
||||
|
||||
The most convenient way to execute admin tasks is to use the
|
||||
[cli](@/docs/tools/cli.md). You get a list of possible admin commands
|
||||
via `dsc admin help`.
|
||||
|
||||
To see the output of the commands, there are these ways:
|
||||
|
||||
1. looking at the joex logs, which gives most details.
|
||||
2. Use the job-queue page when logged in as `docspell-system`
|
||||
3. setup a [webhook](@/docs/webapp/notification.md) to be notified
|
||||
when a job finishes. This way you get a small message.
|
||||
|
||||
All admin tasks (and also some other system tasks) are run under the
|
||||
account `docspell-system` (collective and user). You need to create
|
||||
this account and setup the notification hooks in there - not in your
|
||||
normal account.
|
||||
|
||||
|
||||
## Full-Text Search: SOLR
|
||||
|
||||
@ -204,6 +220,12 @@ a call:
|
||||
$ curl -XPOST -H "Docspell-Admin-Secret: test123" http://localhost:7880/api/v1/admin/fts/reIndexAll
|
||||
```
|
||||
|
||||
or use the [cli](@/docs/tools/cli.md):
|
||||
|
||||
```bash
|
||||
dsc admin -a test123 recreate-index
|
||||
```
|
||||
|
||||
Here the `test123` is the key defined with `admin-endpoint.secret`. If
|
||||
it is empty (the default), this call is disabled (all admin routes).
|
||||
Otherwise, the POST request will submit a system task that is executed
|
||||
@ -445,6 +467,147 @@ If you find that these methods do not suffice for your case, please
|
||||
open an issue.
|
||||
|
||||
|
||||
## File Backends
|
||||
|
||||
Docspell allows to choose from different storage backends for binary
|
||||
files. You can choose between:
|
||||
|
||||
1. *Database (the recommended default)*
|
||||
|
||||
The database can be used to store the files as well. It is the
|
||||
default. It doesn't require any other configuration and works well
|
||||
with multiple instances of restservers and joex nodes.
|
||||
2. *S3*
|
||||
|
||||
The S3 backend allows to store files in an S3 compatible storage.
|
||||
It was tested with MinIO, which is possible to self host.
|
||||
|
||||
3. *Filesystem*
|
||||
|
||||
The filesystem can also be used directly, by specifying a
|
||||
directory. Be aware that _all_ nodes must have read and write
|
||||
access into this directory! When running multiple nodes over a
|
||||
network, consider using one of the above instead. Docspell uses a
|
||||
fixed structure for storing the files below the given directory, it
|
||||
cannot be configured.
|
||||
|
||||
When using S3 or filesystem, remember to backup the database *and* the
|
||||
files!
|
||||
|
||||
Note that Docspell not only stores the file that are uploaded, but
|
||||
also some other files for internal use.
|
||||
|
||||
### Configuring
|
||||
|
||||
{% warningbubble(title="Note") %}
|
||||
|
||||
Each node must have the same config for its file backend! When using
|
||||
the filesystem, make sure all processes can access the directory with
|
||||
read and write permissions.
|
||||
|
||||
{% end %}
|
||||
|
||||
The file storage backend can be configured inside the `files` section
|
||||
(see the default configs below):
|
||||
|
||||
```conf
|
||||
files {
|
||||
…
|
||||
default-store = "database"
|
||||
|
||||
stores = {
|
||||
database =
|
||||
{ enabled = true
|
||||
type = "default-database"
|
||||
}
|
||||
|
||||
filesystem =
|
||||
{ enabled = false
|
||||
type = "file-system"
|
||||
directory = "/some/directory"
|
||||
}
|
||||
|
||||
minio =
|
||||
{ enabled = false
|
||||
type = "s3"
|
||||
endpoint = "http://localhost:9000"
|
||||
access-key = "username"
|
||||
secret-key = "password"
|
||||
bucket = "docspell"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `stores` object defines a set of stores and the `default-store`
|
||||
selects the one that should be used. All disabled store configurations
|
||||
are removed from the list. Thus the `default-store` must be enabled.
|
||||
Other enabled stores can be used as the target when copying files (see
|
||||
below).
|
||||
|
||||
A store configuration requires a `enabled` and `type` property.
|
||||
Depending on the `type` property, other properties are required, they
|
||||
are presented above. The available storage types are
|
||||
`default-database`, `file-system` and `s3`.
|
||||
|
||||
If you use the docker setup, you can find the corresponding
|
||||
environment variables to the above config snippet
|
||||
[below](#environment-variables).
|
||||
|
||||
### Change Backends
|
||||
|
||||
It is possible to change backends with a bit of manual effort. When
|
||||
doing this, please make sure that the application is not used. It is
|
||||
important that no file is uploaded during the following steps.
|
||||
|
||||
The [cli](@/docs/tools/cli.md) will be used, please set it up first
|
||||
and you need to enable the [admin endpoint](#admin-endpoint). Config
|
||||
changes mentioned here must be applied to all nodes - joex and
|
||||
restserver!
|
||||
|
||||
1. In the config, enable a second file backend (besides the default)
|
||||
you want to change to and start docspell as normal. Don't change
|
||||
`default-store` yet.
|
||||
2. Run the file integrity check in order to see whether all files are
|
||||
ok as they are in the current store. This can be done using the
|
||||
[cli](@/docs/tools/cli.md) by running:
|
||||
|
||||
```bash
|
||||
dsc admin file-integrity-check
|
||||
```
|
||||
3. Run the copy files admin command which will copy all files from the
|
||||
current `default-store` to all other enabled stores.
|
||||
|
||||
```bash
|
||||
dsc admin clone-file-repository
|
||||
```
|
||||
|
||||
And wait until it's done :-). You can see the progress in the jobs
|
||||
page when logged in as `docspell-system` or just look at the logs.
|
||||
4. In the config, change the `default-store` to the one you just
|
||||
copied all the files to and restart docspell.
|
||||
5. Login and do some smoke tests. Then run the file integrity check
|
||||
again:
|
||||
|
||||
```bash
|
||||
dsc admin file-integrity-check
|
||||
```
|
||||
|
||||
If all is fine, then you are done and are now using the new file
|
||||
backend. If the second integrity check fails, please open an issue.
|
||||
You need then to revert the config change of step 4 to use the
|
||||
previous `default-store` again.
|
||||
|
||||
If you want to delete the files from the database, you can do so by
|
||||
running the following SQL against the database:
|
||||
|
||||
```sql
|
||||
DELETE FROM filechunk
|
||||
```
|
||||
|
||||
You can copy them back into the database using the steps above.
|
||||
|
||||
|
||||
## File Processing
|
||||
|
||||
Files are being processed by the joex component. So all the respective
|
||||
@ -517,9 +680,14 @@ setting has significant impact, especially when your documents are in
|
||||
German. Here are some rough numbers on jvm heap usage (the same file
|
||||
was used for all tries):
|
||||
|
||||
<table class="table is-hoverable is-striped">
|
||||
<table class="striped-basic">
|
||||
<thead>
|
||||
<tr><th>nlp.mode</th><th>English</th><th>German</th><th>French</th></tr>
|
||||
<tr>
|
||||
<th>nlp.mode</th>
|
||||
<th>English</th>
|
||||
<th>German</th>
|
||||
<th>French</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tfoot>
|
||||
</tfoot>
|
||||
|
@ -3,6 +3,13 @@
|
||||
@apply leading-relaxed text-left;
|
||||
}
|
||||
|
||||
.content table.striped-basic tbody tr {
|
||||
@apply border-t dark:border-stone-600;
|
||||
}
|
||||
.content table {
|
||||
@apply w-full my-2 px-4;
|
||||
}
|
||||
|
||||
.content h1:not(.no-default) {
|
||||
@apply text-4xl font-serif font-bold mt-6 mb-3 py-1 border-b dark:border-stone-800 text-stone-700 dark:text-stone-200;
|
||||
}
|
||||
|
Loading…
x
Reference in New Issue
Block a user