From 3e87feff7b8242d0c60d6b09db7a5d20b44cbddf Mon Sep 17 00:00:00 2001 From: eikek Date: Mon, 21 Mar 2022 13:42:10 +0100 Subject: [PATCH] Add some docs for postgres fts --- website/site/content/docs/configure/_index.md | 103 +++++++++++++++++- .../site/content/docs/install/download_run.md | 2 +- 2 files changed, 99 insertions(+), 6 deletions(-) diff --git a/website/site/content/docs/configure/_index.md b/website/site/content/docs/configure/_index.md index 4c09fce3..2fc11ed6 100644 --- a/website/site/content/docs/configure/_index.md +++ b/website/site/content/docs/configure/_index.md @@ -177,17 +177,37 @@ this account and setup the notification hooks in there - not in your normal account. -## Full-Text Search: SOLR +## Full-Text Search -[Apache SOLR](https://solr.apache.org) is used to provide the -full-text search. Both docspell components must provide the same -connection setup. This is defined in the `full-text-search.solr` +Fulltext search is optional and provided by external systems. There +are currently [Apache SOLR](https://solr.apache.org) and [PostgreSQL's +text search](https://www.postgresql.org/docs/14/textsearch.html) +available. + +You can enable and configure the fulltext search backends as described +below and then choose the wanted backend: + +```conf +full-text-search { + enabled = true + # Which backend to use, either solr or postgresql + backend = "solr" + … +} +``` + +All docspell components must provide the same fulltext search +configuration. + +### SOLR + +[Apache SOLR](https://solr.apache.org) can be used to provide the +full-text search. This is defined in the `full-text-search.solr` subsection: ``` bash ... full-text-search { - enabled = true ... solr = { url = "http://localhost:8983/solr/docspell" @@ -247,6 +267,79 @@ The solr index doesn't contain any new information, it can be regenerated any time using the above REST call. Thus it doesn't need to be backed up. +### PostgreSQL + +PostgreSQL provides many additional features, one of them is [text +search](https://www.postgresql.org/docs/14/textsearch.html). Docspell +can utilize this to provide the fulltext search feature. This is +especially useful, if PostgreSQL is used as the primary database for +docspell. + +You can choose to use the same database or separate connection. The +fulltext search will create a single table `ftspsql_search` that holds +all necessary data. When doing backups, you can exclude this table as +it can be recreated from the primary data any time. + +The configuration is placed inside `full-text-search`: + +```conf +full-text-search { + … + postgresql = { + use-default-connection = false + + jdbc { + url = "jdbc:postgresql://server:5432/db" + user = "pguser" + password = "" + } + + pg-config = { + } + pg-query-parser = "websearch_to_tsquery" + pg-rank-normalization = [ 4 ] + } +} +``` + +The flag `use-default-connection` can be set to `true` if you use +PostgreSQL as the primary db to have it also used for the fulltext +search. If set to `false`, the subsequent `jdbc` block defines the +connection to the postgres database to use. + +It follows some settings to tune PostgreSQL's text search feature. +Please visit [their +documentation](https://www.postgresql.org/docs/14/textsearch.html) for +all the details. + +- `pg-config`: this is an optional mapping from document languages as + used in Docspell to a PostgreSQL text search configuration. Not all + languages are equally well supported out of the box. You can create + your own text search config in PostgreSQL and then define it in this + map for your language. For example: + + ```conf + pg-config = { + english = "my-english" + german = "my-german" + } + ``` + + By default, the predefined configs are used for some lanugages and + otherwise fallback to `simple`. + + *If you change this setting, you must re-index everything.* +- `pg-query-parser`: the parser applied to the fulltext query. By + default it is `websearch_to_tsquery`. (relevant [doc + link](https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES)) +- `pg-rank-normalization`: this is used to tweak rank calculation that + affects the order of the elements returned from a query. It is an + array of numbers out of `1`, `2`, `4`, `8`, `16` or `32`. (relevant + [doc + link](https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-RANKING)) + + + ## Bind The host and port the http server binds to. This applies to both diff --git a/website/site/content/docs/install/download_run.md b/website/site/content/docs/install/download_run.md index bab0ee6b..e87f8ea5 100644 --- a/website/site/content/docs/install/download_run.md +++ b/website/site/content/docs/install/download_run.md @@ -110,7 +110,7 @@ Fulltext search is powered by [SOLR](https://solr.apache.org). You need to install solr and create a core for docspell. Then cange the solr url for both components (restserver and joex) accordingly. See the relevant section in the [config -page](@/docs/configure/_index.md#full-text-search-solr). +page](@/docs/configure/_index.md#full-text-search). ### Watching a directory