From 51a348ce3f76623b416a3b855e67b764b7e624e9 Mon Sep 17 00:00:00 2001
From: Eike Kettner <eike.kettner@posteo.de>
Date: Wed, 21 Oct 2020 21:24:26 +0200
Subject: [PATCH] Add a faq section to the docs

Closes #337
---
 website/site/content/docs/faq/_index.md | 137 ++++++++++++++++++++++++
 1 file changed, 137 insertions(+)
 create mode 100644 website/site/content/docs/faq/_index.md

diff --git a/website/site/content/docs/faq/_index.md b/website/site/content/docs/faq/_index.md
new file mode 100644
index 00000000..78484da4
--- /dev/null
+++ b/website/site/content/docs/faq/_index.md
@@ -0,0 +1,137 @@
++++
+title = "FAQ"
+weight = 100
+description = "Frequently asked questions."
+insert_anchor_links = "right"
+[extra]
+mktoc = true
++++
+
+# FAQ
+
+## Where are my files stored?
+
+Everything, including all files, are stored in the database.
+
+Now that seems to put off some people coming to Docspell, so here are
+some thoughts on why this is and why is may be not such a big deal. It
+was a conscious decision and the option to hold all files in the file
+system was considered, but not chosen in the end.
+
+First, it was clear that a database *is* required in order to support
+the planned features. It is required to efficiently support a
+multi-user application: the account data, passwords and many other
+things (tags, metadata etc) must be stored and queried reliably. Very
+often a relational model emerges and a database is the best fit,
+otherwise one would just "reinvent the wheel". So the options are to
+have a database *and* files in the filesystem or everything in one
+database. There are, of course, pros and cons for both ways, these
+were the reasons for the current decision:
+
+- Backups: With two things, you have to take care to backup both. All
+  supported databases have good support for backups so having just one
+  thing to backup is (usually) better than having to backup two
+  things. YMMV if you already have some backups system in place.
+- Simpler, easier to maintain application: there is just one storage
+  system used by the application and not two which reduces complexity
+  in the code.
+- Consistency: Both "databases" (filesystem + relational db) can
+  easily get out of sync and this will break the application. It's
+  very strong plus to be able to rely on the strong ACID guarantees of
+  database systems.
+- Distributed/Scaling: One goal is to run Docspell in a distributed
+  way. If files were on the filesystem, the problem is that they have
+  to be transferred to all the nodes eventually. This is trivially
+  solved to use the database as a central storage and synchronization
+  point.
+- Support for binary files in today's databases is not that bad.
+  Docspell has no intention to store very large files. It will be
+  quite efficient. I've used it several times and never had problems
+  related to this.
+  [This](https://wiki.postgresql.org/wiki/BinaryFilesInDB) postgres
+  page shows some pros and cons.
+- The advantage of having files in the filesystem is weakened, if
+  files have to be stored using some hash of filenames which might be
+  necessary in order to overcome certain file-system limitations.
+- For low-volume/traffic installations where you just don't want to
+  run a real database server, you can use the
+  [H2](https://h2database.com) database. This is an in-process
+  database (comparable to sqlite) and doesn't require another server
+  running.
+
+You can find more in these issues:
+[270](https://github.com/eikek/docspell/issues/270),
+[289](https://github.com/eikek/docspell/issues/289#issuecomment-700843894).
+
+
+## What's the Exit Strategy then?
+
+Of course, there is no guarantee that this project will be alive in
+the future. It is important to know how to use your data then.
+
+A very important thing is: it is FREE software (as in freedom and in
+beer). That is, you can be sure to use the current version for as long
+as you want. So it is a good idea to backup the releases (or docker
+images) you are using alongside with your data. This ensures that you
+will be able to *use* your data "forever". This also means that you
+can inspect the data model and use the api and/or standard SQL tools
+to get all the data. While this may be difficult/inconvenient, the
+point here is only that it is possible. It's not hidden or obscured,
+nothing is lost. You can even backup the sources to keep this
+documentation, too.
+
+In order to move to a different tool, it is necessary to get the data
+out of Docspell in a machine readable/automatic way. Currently, there
+is no *easy way* for this. However, it is possible to get to all data
+with some scripting effort. Everything can be queried using a
+[HTTP/REST api](@/docs/api/_index.md) and so you can write a
+script/program that, for example, queries all items and downloads the
+files (something like this might be provided soon, for now there are
+starting points in the `/tools` folder). It is planned to provide a
+more convenient way to export the data into the file system. But there
+is no ETA for this.
+
+My recommendation is to run periodic database backups and also store
+the binaries/docker images. This lets you re-create the current state
+any time which allows to postpone the problem of getting the data in a
+specific format out of Docspell.
+
+Note that you don't need to backup the SOLR instance (if you're using
+fulltext search), since it can be recreated by Docspell.
+
+
+## There are no thumbnails of my documents?
+
+Thumbnails are currently not implemented. I experimented with this
+early and found that I don't need them :-) My documents were too
+similar and I found myself looking always at correspondent and tags.
+But it is planned to add thumbnails! I just don't have an ETA.
+
+
+## What if my documents already contain OCR-ed text?
+
+Documents are not ocr-ed twice normally. Doscpell first extracts the
+text from a pdf. If this is below some configurable minimum length, it
+will still run OCR just to see if that gives more. Then the longer of
+the texts is taken. By default it will hand all pdfs to ocrmypdf, but
+this will skip already ocred files. The whole ocrmypdf process can be
+switched off in the config file. So if you only have these pdfs, this
+would be an option, I guess. Alternatively, it is possible to change
+the ocrmypdf options in docspell's config file to fit your
+requirements.
+
+
+## Is there support for migrating from other tools?
+
+Currently there exists a bash script to import files and metadata from
+[Paperless](https://github.com/the-paperless-project/paperless/).
+Please see this [issue](https://github.com/eikek/docspell/issues/358).
+
+
+## Wh…?
+
+If you have any questions, don't hesitate to ask. You can open an
+[issue](https://github.com/eikek/docspell/issues/new/choose) or leave
+a message in the [gitter](https://gitter.im/eikek/docspell) room. If
+you don't want to sign-up there, drop a mail to `info` at
+`docspell.org`.