mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-08-05 02:24:52 +00:00
Add some docs to file backends
This commit is contained in:
@ -8,21 +8,20 @@ package docspell.joex.hk
|
|||||||
|
|
||||||
import cats.effect._
|
import cats.effect._
|
||||||
import cats.implicits._
|
import cats.implicits._
|
||||||
|
|
||||||
import docspell.common._
|
import docspell.common._
|
||||||
import docspell.joex.Config
|
import docspell.joex.Config
|
||||||
import docspell.joex.scheduler.Task
|
import docspell.joex.scheduler.Task
|
||||||
import docspell.store.records._
|
import docspell.store.records._
|
||||||
import docspell.store.usertask.UserTaskScope
|
import docspell.store.usertask.UserTaskScope
|
||||||
|
|
||||||
import com.github.eikek.calev._
|
import com.github.eikek.calev._
|
||||||
|
import docspell.backend.ops.OFileRepository
|
||||||
|
|
||||||
object HouseKeepingTask {
|
object HouseKeepingTask {
|
||||||
private val periodicId = Ident.unsafe("docspell-houskeeping")
|
private val periodicId = Ident.unsafe("docspell-houskeeping")
|
||||||
|
|
||||||
val taskName: Ident = Ident.unsafe("housekeeping")
|
val taskName: Ident = Ident.unsafe("housekeeping")
|
||||||
|
|
||||||
def apply[F[_]: Async](cfg: Config): Task[F, Unit, Unit] =
|
def apply[F[_]: Async](cfg: Config, fileRepo: OFileRepository[F]): Task[F, Unit, Unit] =
|
||||||
Task
|
Task
|
||||||
.log[F, Unit](_.info(s"Running house-keeping task now"))
|
.log[F, Unit](_.info(s"Running house-keeping task now"))
|
||||||
.flatMap(_ => CleanupInvitesTask(cfg.houseKeeping.cleanupInvites))
|
.flatMap(_ => CleanupInvitesTask(cfg.houseKeeping.cleanupInvites))
|
||||||
|
@ -160,6 +160,22 @@ enabled by providing a secret:
|
|||||||
This secret must be provided to all requests to a `/api/v1/admin/`
|
This secret must be provided to all requests to a `/api/v1/admin/`
|
||||||
endpoint.
|
endpoint.
|
||||||
|
|
||||||
|
The most convenient way to execute admin tasks is to use the
|
||||||
|
[cli](@/docs/tools/cli.md). You get a list of possible admin commands
|
||||||
|
via `dsc admin help`.
|
||||||
|
|
||||||
|
To see the output of the commands, there are these ways:
|
||||||
|
|
||||||
|
1. looking at the joex logs, which gives most details.
|
||||||
|
2. Use the job-queue page when logged in as `docspell-system`
|
||||||
|
3. setup a [webhook](@/docs/webapp/notification.md) to be notified
|
||||||
|
when a job finishes. This way you get a small message.
|
||||||
|
|
||||||
|
All admin tasks (and also some other system tasks) are run under the
|
||||||
|
account `docspell-system` (collective and user). You need to create
|
||||||
|
this account and setup the notification hooks in there - not in your
|
||||||
|
normal account.
|
||||||
|
|
||||||
|
|
||||||
## Full-Text Search: SOLR
|
## Full-Text Search: SOLR
|
||||||
|
|
||||||
@ -204,6 +220,12 @@ a call:
|
|||||||
$ curl -XPOST -H "Docspell-Admin-Secret: test123" http://localhost:7880/api/v1/admin/fts/reIndexAll
|
$ curl -XPOST -H "Docspell-Admin-Secret: test123" http://localhost:7880/api/v1/admin/fts/reIndexAll
|
||||||
```
|
```
|
||||||
|
|
||||||
|
or use the [cli](@/docs/tools/cli.md):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dsc admin -a test123 recreate-index
|
||||||
|
```
|
||||||
|
|
||||||
Here the `test123` is the key defined with `admin-endpoint.secret`. If
|
Here the `test123` is the key defined with `admin-endpoint.secret`. If
|
||||||
it is empty (the default), this call is disabled (all admin routes).
|
it is empty (the default), this call is disabled (all admin routes).
|
||||||
Otherwise, the POST request will submit a system task that is executed
|
Otherwise, the POST request will submit a system task that is executed
|
||||||
@ -445,6 +467,147 @@ If you find that these methods do not suffice for your case, please
|
|||||||
open an issue.
|
open an issue.
|
||||||
|
|
||||||
|
|
||||||
|
## File Backends
|
||||||
|
|
||||||
|
Docspell allows to choose from different storage backends for binary
|
||||||
|
files. You can choose between:
|
||||||
|
|
||||||
|
1. *Database (the recommended default)*
|
||||||
|
|
||||||
|
The database can be used to store the files as well. It is the
|
||||||
|
default. It doesn't require any other configuration and works well
|
||||||
|
with multiple instances of restservers and joex nodes.
|
||||||
|
2. *S3*
|
||||||
|
|
||||||
|
The S3 backend allows to store files in an S3 compatible storage.
|
||||||
|
It was tested with MinIO, which is possible to self host.
|
||||||
|
|
||||||
|
3. *Filesystem*
|
||||||
|
|
||||||
|
The filesystem can also be used directly, by specifying a
|
||||||
|
directory. Be aware that _all_ nodes must have read and write
|
||||||
|
access into this directory! When running multiple nodes over a
|
||||||
|
network, consider using one of the above instead. Docspell uses a
|
||||||
|
fixed structure for storing the files below the given directory, it
|
||||||
|
cannot be configured.
|
||||||
|
|
||||||
|
When using S3 or filesystem, remember to backup the database *and* the
|
||||||
|
files!
|
||||||
|
|
||||||
|
Note that Docspell not only stores the file that are uploaded, but
|
||||||
|
also some other files for internal use.
|
||||||
|
|
||||||
|
### Configuring
|
||||||
|
|
||||||
|
{% warningbubble(title="Note") %}
|
||||||
|
|
||||||
|
Each node must have the same config for its file backend! When using
|
||||||
|
the filesystem, make sure all processes can access the directory with
|
||||||
|
read and write permissions.
|
||||||
|
|
||||||
|
{% end %}
|
||||||
|
|
||||||
|
The file storage backend can be configured inside the `files` section
|
||||||
|
(see the default configs below):
|
||||||
|
|
||||||
|
```conf
|
||||||
|
files {
|
||||||
|
…
|
||||||
|
default-store = "database"
|
||||||
|
|
||||||
|
stores = {
|
||||||
|
database =
|
||||||
|
{ enabled = true
|
||||||
|
type = "default-database"
|
||||||
|
}
|
||||||
|
|
||||||
|
filesystem =
|
||||||
|
{ enabled = false
|
||||||
|
type = "file-system"
|
||||||
|
directory = "/some/directory"
|
||||||
|
}
|
||||||
|
|
||||||
|
minio =
|
||||||
|
{ enabled = false
|
||||||
|
type = "s3"
|
||||||
|
endpoint = "http://localhost:9000"
|
||||||
|
access-key = "username"
|
||||||
|
secret-key = "password"
|
||||||
|
bucket = "docspell"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `stores` object defines a set of stores and the `default-store`
|
||||||
|
selects the one that should be used. All disabled store configurations
|
||||||
|
are removed from the list. Thus the `default-store` must be enabled.
|
||||||
|
Other enabled stores can be used as the target when copying files (see
|
||||||
|
below).
|
||||||
|
|
||||||
|
A store configuration requires a `enabled` and `type` property.
|
||||||
|
Depending on the `type` property, other properties are required, they
|
||||||
|
are presented above. The available storage types are
|
||||||
|
`default-database`, `file-system` and `s3`.
|
||||||
|
|
||||||
|
If you use the docker setup, you can find the corresponding
|
||||||
|
environment variables to the above config snippet
|
||||||
|
[below](#environment-variables).
|
||||||
|
|
||||||
|
### Change Backends
|
||||||
|
|
||||||
|
It is possible to change backends with a bit of manual effort. When
|
||||||
|
doing this, please make sure that the application is not used. It is
|
||||||
|
important that no file is uploaded during the following steps.
|
||||||
|
|
||||||
|
The [cli](@/docs/tools/cli.md) will be used, please set it up first
|
||||||
|
and you need to enable the [admin endpoint](#admin-endpoint). Config
|
||||||
|
changes mentioned here must be applied to all nodes - joex and
|
||||||
|
restserver!
|
||||||
|
|
||||||
|
1. In the config, enable a second file backend (besides the default)
|
||||||
|
you want to change to and start docspell as normal. Don't change
|
||||||
|
`default-store` yet.
|
||||||
|
2. Run the file integrity check in order to see whether all files are
|
||||||
|
ok as they are in the current store. This can be done using the
|
||||||
|
[cli](@/docs/tools/cli.md) by running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dsc admin file-integrity-check
|
||||||
|
```
|
||||||
|
3. Run the copy files admin command which will copy all files from the
|
||||||
|
current `default-store` to all other enabled stores.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dsc admin clone-file-repository
|
||||||
|
```
|
||||||
|
|
||||||
|
And wait until it's done :-). You can see the progress in the jobs
|
||||||
|
page when logged in as `docspell-system` or just look at the logs.
|
||||||
|
4. In the config, change the `default-store` to the one you just
|
||||||
|
copied all the files to and restart docspell.
|
||||||
|
5. Login and do some smoke tests. Then run the file integrity check
|
||||||
|
again:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dsc admin file-integrity-check
|
||||||
|
```
|
||||||
|
|
||||||
|
If all is fine, then you are done and are now using the new file
|
||||||
|
backend. If the second integrity check fails, please open an issue.
|
||||||
|
You need then to revert the config change of step 4 to use the
|
||||||
|
previous `default-store` again.
|
||||||
|
|
||||||
|
If you want to delete the files from the database, you can do so by
|
||||||
|
running the following SQL against the database:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
DELETE FROM filechunk
|
||||||
|
```
|
||||||
|
|
||||||
|
You can copy them back into the database using the steps above.
|
||||||
|
|
||||||
|
|
||||||
## File Processing
|
## File Processing
|
||||||
|
|
||||||
Files are being processed by the joex component. So all the respective
|
Files are being processed by the joex component. So all the respective
|
||||||
@ -517,9 +680,14 @@ setting has significant impact, especially when your documents are in
|
|||||||
German. Here are some rough numbers on jvm heap usage (the same file
|
German. Here are some rough numbers on jvm heap usage (the same file
|
||||||
was used for all tries):
|
was used for all tries):
|
||||||
|
|
||||||
<table class="table is-hoverable is-striped">
|
<table class="striped-basic">
|
||||||
<thead>
|
<thead>
|
||||||
<tr><th>nlp.mode</th><th>English</th><th>German</th><th>French</th></tr>
|
<tr>
|
||||||
|
<th>nlp.mode</th>
|
||||||
|
<th>English</th>
|
||||||
|
<th>German</th>
|
||||||
|
<th>French</th>
|
||||||
|
</tr>
|
||||||
</thead>
|
</thead>
|
||||||
<tfoot>
|
<tfoot>
|
||||||
</tfoot>
|
</tfoot>
|
||||||
|
@ -3,6 +3,13 @@
|
|||||||
@apply leading-relaxed text-left;
|
@apply leading-relaxed text-left;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.content table.striped-basic tbody tr {
|
||||||
|
@apply border-t dark:border-stone-600;
|
||||||
|
}
|
||||||
|
.content table {
|
||||||
|
@apply w-full my-2 px-4;
|
||||||
|
}
|
||||||
|
|
||||||
.content h1:not(.no-default) {
|
.content h1:not(.no-default) {
|
||||||
@apply text-4xl font-serif font-bold mt-6 mb-3 py-1 border-b dark:border-stone-800 text-stone-700 dark:text-stone-200;
|
@apply text-4xl font-serif font-bold mt-6 mb-3 py-1 border-b dark:border-stone-800 text-stone-700 dark:text-stone-200;
|
||||||
}
|
}
|
||||||
|
Reference in New Issue
Block a user