diff --git a/website/site/content/docs/faq/_index.md b/website/site/content/docs/faq/_index.md new file mode 100644 index 00000000..78484da4 --- /dev/null +++ b/website/site/content/docs/faq/_index.md @@ -0,0 +1,137 @@ ++++ +title = "FAQ" +weight = 100 +description = "Frequently asked questions." +insert_anchor_links = "right" +[extra] +mktoc = true ++++ + +# FAQ + +## Where are my files stored? + +Everything, including all files, are stored in the database. + +Now that seems to put off some people coming to Docspell, so here are +some thoughts on why this is and why is may be not such a big deal. It +was a conscious decision and the option to hold all files in the file +system was considered, but not chosen in the end. + +First, it was clear that a database *is* required in order to support +the planned features. It is required to efficiently support a +multi-user application: the account data, passwords and many other +things (tags, metadata etc) must be stored and queried reliably. Very +often a relational model emerges and a database is the best fit, +otherwise one would just "reinvent the wheel". So the options are to +have a database *and* files in the filesystem or everything in one +database. There are, of course, pros and cons for both ways, these +were the reasons for the current decision: + +- Backups: With two things, you have to take care to backup both. All + supported databases have good support for backups so having just one + thing to backup is (usually) better than having to backup two + things. YMMV if you already have some backups system in place. +- Simpler, easier to maintain application: there is just one storage + system used by the application and not two which reduces complexity + in the code. +- Consistency: Both "databases" (filesystem + relational db) can + easily get out of sync and this will break the application. It's + very strong plus to be able to rely on the strong ACID guarantees of + database systems. +- Distributed/Scaling: One goal is to run Docspell in a distributed + way. If files were on the filesystem, the problem is that they have + to be transferred to all the nodes eventually. This is trivially + solved to use the database as a central storage and synchronization + point. +- Support for binary files in today's databases is not that bad. + Docspell has no intention to store very large files. It will be + quite efficient. I've used it several times and never had problems + related to this. + [This](https://wiki.postgresql.org/wiki/BinaryFilesInDB) postgres + page shows some pros and cons. +- The advantage of having files in the filesystem is weakened, if + files have to be stored using some hash of filenames which might be + necessary in order to overcome certain file-system limitations. +- For low-volume/traffic installations where you just don't want to + run a real database server, you can use the + [H2](https://h2database.com) database. This is an in-process + database (comparable to sqlite) and doesn't require another server + running. + +You can find more in these issues: +[270](https://github.com/eikek/docspell/issues/270), +[289](https://github.com/eikek/docspell/issues/289#issuecomment-700843894). + + +## What's the Exit Strategy then? + +Of course, there is no guarantee that this project will be alive in +the future. It is important to know how to use your data then. + +A very important thing is: it is FREE software (as in freedom and in +beer). That is, you can be sure to use the current version for as long +as you want. So it is a good idea to backup the releases (or docker +images) you are using alongside with your data. This ensures that you +will be able to *use* your data "forever". This also means that you +can inspect the data model and use the api and/or standard SQL tools +to get all the data. While this may be difficult/inconvenient, the +point here is only that it is possible. It's not hidden or obscured, +nothing is lost. You can even backup the sources to keep this +documentation, too. + +In order to move to a different tool, it is necessary to get the data +out of Docspell in a machine readable/automatic way. Currently, there +is no *easy way* for this. However, it is possible to get to all data +with some scripting effort. Everything can be queried using a +[HTTP/REST api](@/docs/api/_index.md) and so you can write a +script/program that, for example, queries all items and downloads the +files (something like this might be provided soon, for now there are +starting points in the `/tools` folder). It is planned to provide a +more convenient way to export the data into the file system. But there +is no ETA for this. + +My recommendation is to run periodic database backups and also store +the binaries/docker images. This lets you re-create the current state +any time which allows to postpone the problem of getting the data in a +specific format out of Docspell. + +Note that you don't need to backup the SOLR instance (if you're using +fulltext search), since it can be recreated by Docspell. + + +## There are no thumbnails of my documents? + +Thumbnails are currently not implemented. I experimented with this +early and found that I don't need them :-) My documents were too +similar and I found myself looking always at correspondent and tags. +But it is planned to add thumbnails! I just don't have an ETA. + + +## What if my documents already contain OCR-ed text? + +Documents are not ocr-ed twice normally. Doscpell first extracts the +text from a pdf. If this is below some configurable minimum length, it +will still run OCR just to see if that gives more. Then the longer of +the texts is taken. By default it will hand all pdfs to ocrmypdf, but +this will skip already ocred files. The whole ocrmypdf process can be +switched off in the config file. So if you only have these pdfs, this +would be an option, I guess. Alternatively, it is possible to change +the ocrmypdf options in docspell's config file to fit your +requirements. + + +## Is there support for migrating from other tools? + +Currently there exists a bash script to import files and metadata from +[Paperless](https://github.com/the-paperless-project/paperless/). +Please see this [issue](https://github.com/eikek/docspell/issues/358). + + +## Wh…? + +If you have any questions, don't hesitate to ask. You can open an +[issue](https://github.com/eikek/docspell/issues/new/choose) or leave +a message in the [gitter](https://gitter.im/eikek/docspell) room. If +you don't want to sign-up there, drop a mail to `info` at +`docspell.org`.