diff --git a/website/site/content/docs/configure/_index.md b/website/site/content/docs/configure/_index.md index 3b8ec7fd..6af41559 100644 --- a/website/site/content/docs/configure/_index.md +++ b/website/site/content/docs/configure/_index.md @@ -405,7 +405,30 @@ allows comments and has some [advanced features](https://github.com/lightbend/config#features-of-hocon). Please refer to their documentation for more on this. -Here are the default configurations. +A short description (please see the links for better understanding): +The config consists of key-value pairs and can be written in a +JSON-like format (called HOCON). Keys are organized in trees, and a +key defines a full path into the tree. There are two ways: + +``` +a.b.c.d=15 +``` + +or + +``` +a { + b { + c { + d = 15 + } + } +} +``` + +Both are exactly the same and these forms are both used at the same +time. Usually the braces approach is used to group some more settings, +for better readability. # Default Config diff --git a/website/site/content/docs/faq/_index.md b/website/site/content/docs/faq/_index.md index 1f8ace44..fa25bcc1 100644 --- a/website/site/content/docs/faq/_index.md +++ b/website/site/content/docs/faq/_index.md @@ -114,6 +114,11 @@ Currently there exists a bash script to import files and metadata from [Paperless](https://github.com/the-paperless-project/paperless/). Please see this [issue](https://github.com/eikek/docspell/issues/358). +## Why another DMS? + +Back when Docspell started, there weren't as many options as there are +now. I wanted to try out a different approach. You can read more about +that [here](@/docs/intro/_index.md#rationale). ## Wh…? diff --git a/website/site/content/docs/intro/_index.md b/website/site/content/docs/intro/_index.md index c7bd7e7d..d14484be 100644 --- a/website/site/content/docs/intro/_index.md +++ b/website/site/content/docs/intro/_index.md @@ -11,26 +11,22 @@ mktoc = true Docspell aims to be a simple yet effective document organizer that makes stowing documents away very quick and finding them later -reliable (and also fast). It doesn't require technical background or -studying huge manuals in order to use it. With this in mind, it is -rather opinionated and more targeted for home use and small/medium -organizations. +reliable (and also fast). It is a bit opinionated and more targeted +for home use and small/medium organizations. In contrast to many DMS, the main focus is not so much to provide all kinds of features to manually create organizational structures, like folder hierarchies, where you place the documents yourself. The -approach is more to leave it as a big pile of documents, but extract -and attach metadata from each document. These are mainly properties -that emerge from the document itself. The reason is that this is -possible to automate and that many custom folder structures include -these metadata somewhere, too. This makes it very simple to *add* -documents, because there is no time spent to think about where to put -it. And it is possible to apply different structures on top later, -like show first all documents of a specific correspondent, then all -with tag 'invoice', etc. If these properties are attached to all -documents, it is really easy to find a document. It even can be -combined with fulltext search for the, hopefully rare, desperate -cases. +approach is to leave it as a big pile of documents, but extract and +attach metadata from each document. These are mainly properties that +emerge from the document itself. The reason is that this is possible +to automate. This makes it very simple to *add* documents, because +there is no time spent to think about where to put it. And it is +possible to apply different structures on top later, like show first +all documents of a specific correspondent, then all with tag +'invoice', etc. If these properties are attached to all documents, it +is really easy to find a document. It even can be combined with +fulltext search for the, hopefully rare, desperate cases. Of course, it is also possible to add custom properties and arbitrary tags. @@ -153,3 +149,115 @@ create a folder and associate members. It is possible to put items in these folders and docspell shows only items that are either in no specific folder or in a folder where the current user is owner or member. + +# Rationale + +In 2019, I started to think about creating a dms-like tool that is now +Docspell. It started at the end of that year with the initial version, +including the very basic idea around which I want to create some kind +of document management system. + +The following anecdote summarizes why I thought yet another dms-like +tool might be useful. + +I tried some DMS at that time, to see whether they could help me with +the ever growing pile of documents. It's not just postal mail, now it +gets mixed with invoices via e-mail, bank statements I need to +download at some "portal" etc. It's all getting a huge mess. When +looking for a specific document, it's hard to find. + +I found all the enterprisy DMS are way above of what I need. They are +rather difficult to setup and very hard to explain to non-technical +people. They offer a lot of features and there is quite some time +required to extract what's needed. I then discovered tools, that seem +to better suite my needs. Their design were simple and very close to +what I was looking for, making it a good fit for single user. There +were only a few things to nag: + +1. Often it was not possible to track multiple files as one "unit". + For example: reports with accompanying pictures that I would like + to treat as a single unit. It also more naturally fits to the + common e-mail. +2. Missing good multi-user support; and/or a simple enough interface + so that non-technical users can also make sense of it. +3. Missing some features important to me, like "send this by mail", a + full REST api, and some more +4. still a lot of "manually" organizing documents + +These are not big complaints, they are solvable somehow. I want to +focus on the last point: most systems didn't offer help with +organizing the documents. I didn't find any, that included basic +machine learning features. On most systems it was possible to organize +documents into a custom folder structure. But it was all manually. You +would need to move incoming documents into some subfolder. Some +systems offered rules that get applied to documents in order to put +them into the right place. Many offered tags, too, which relieves some +of weight of this text. But they were also all manual. So the idea +came to let the computer do a little more to help organize documents. + +Let's start with the rules approach: A rule may look like this: + +> when the document contains a text 'invoice' and 'repair company x', +> then put it in subfolder B". + +This rule can be applied to all the new documents to get automatically +placed into this subfolder. I think there are some drawbacks to this +approach: + +- rules may change over time. Then you either must re-apply them all + to all documents or leave older ones where they are. If re-applying + them, some documents may not be in places as before which can easily + confuse coworkers. +- these rules may interfere with each other, then it might get more + difficult to know where a document is +- rules can become complex, be comprised of regular expressions, which + are really only suited to technical people and need to be + maintained. + +I decided to try out a different approach: a "search-only" one¹. +Instead of using a manual created folder structure, I simply search +every time using this rule. In essence such a rule is a search query. +But searching with rules like the one above is not very efficient. One +would need to do fulltext searches, even extracting dates "on the fly" +etc. It wouldn't be very reliable either. That's why documents have +properties (called metadata). In my case most of them have a +correspondent, a date and so on. If these properties were defined on +documents, the queries become quite efficient. The idea is now, not to +use rules for moving documents to some place, but for attaching +properties, information, to each document. This solves a few issues: +they can't get easily out of sync, and they can't interfere. Then +docspell can help with finding some of these properties automatically. +For example: it can propose properties by looking at the text. It can +also take existing documents into account when suggesting tags. In +docspell, it is not possible to define custom rules, instead it tries +to find these rules for you by looking at the text and your previous +documents. + +That said, there is still a manual process involved, but I found it +much lighter. Once in a while, looking at new documents and confirming +or fixing the metadata is necessary. This doesn't involve deciding for +a place, though. What properties you are interested to track can be +configured; should you only need a correspondent and a date, +everything else can be hidden. + +So in docspell, all documents are just in one big pile… but every +document has metadata attached that can be used to quickly find what +you need. There is no folder structure, but it is possible to later +apply certain hierarchical structures. It would be possible to create +a "folder structure", like the one mentioned above: click on +correspondent `repair company x`; then on tag `invoice`, then +`concerning=car` and `year=2019`. A UI could be created to present +exactly this hierarchy. Since I can't know your preferred structure +(not even my own…!), the docspell ui allows every combination, +regardless any hierarchies. You can first select a correspondent, then +a tag or the other way around. Usually it's not necessary to go very +deep. + +That's all about it! I thought why not try this approach and at the +same time learn about some technologies around. In the last year, +docspell evolved to a quite usable tool, imho. This was only possible, +because very nice people gave valueable feedback and ideas! + + +¹This is inspired by tools like +[mu](https://www.djcbsoftware.nl/code/mu/) and GMail. diff --git a/website/site/content/docs/query/_index.md b/website/site/content/docs/query/_index.md index dad3d732..8a861be8 100644 --- a/website/site/content/docs/query/_index.md +++ b/website/site/content/docs/query/_index.md @@ -9,7 +9,7 @@ mktoc = true Docspell uses a query language to provide a powerful way to search for -your documents. It is targeted at advanced users and it needs to be +your documents. It is targeted at "power users" and it needs to be enabled explicitely in your user settings.
@@ -62,7 +62,7 @@ There are 7 operators: - `>` for greater-than - `>=` for greater-equals - `~=` for "in" (a shorter way to say "a or b or c or d") -- `:` for "like" +- `:` for "like", this is used in a context-sensitive way - `<` for lower than - `<=` for lower-equal - `!=` for not-equals @@ -76,7 +76,7 @@ what operators are allowed. There are fields where an item can have at most one value (like `name` or `notes`) and there are fields where an item can have multiple values (like `tag`). At last there are special fields that are either implemented directly using custom sql or that -are shortcuts to a longer form. +are only shortcuts to a longer form. Here is the list of all available fields. @@ -104,7 +104,7 @@ These fields map to at most one value: show only incoming, `false` to show only outgoing. These fields support all operators, except `incoming` and `inbox` -which expect boolean values and there these operators don't make much +which expect boolean values and for those some operators don't make sense. Fields that map to more than one value: @@ -117,18 +117,18 @@ The tag and category fields use two operators: `:` and `=`. Other special fields: -- `attach.id` -- `checksum` -- `content` +- `attach.id` references the id of an attachment +- `checksum` references the sha256 checksum of a file +- `content` for fulltext search - `f` for referencing custom fields by name - `f.id` for referencing custom fields by their id - `dateIn` a shortcut for a range search - `dueIn` a shortcut for a range search - `exist` check if some porperty exists -- `names` -- `year` -- `conc` -- `corr` +- `names` a shortcut to search in several names via `:` +- `year` a shortcut for a year range +- `conc` a shortcut for concerning person and equipment names +- `corr` a shortcut for correspondent org and person names These fields are often using the `:` operator to simply separate field and value. They are often backed by a custom implementation, or they @@ -137,8 +137,8 @@ are shortcuts for a longer query. ## Values Values are the data you want to search for. There are different kinds -of that, too: there are text-based values, numbers, boolean and dates. -When multiple values are allowed, they must be separated by comma `,`. +of that, too: there are text values, numbers, boolean and dates. When +multiple values are allowed, they must be separated by comma `,`. ### Text Values @@ -152,6 +152,7 @@ these characters: - parens `()` Any quotes inside a quoted string must be escaped with a backslash. + Examples: `scan_123`, `a-b-c`, `x.y.z`, `"scan from today"`, `"a \"strange\" name.pdf"` @@ -185,14 +186,15 @@ prefixed by `ms`. The time part is ignored. Examples: #### Calculation -Dates can be defined by providing a base date and a period to add or -substract. This is especially useful with the `today` pattern. The -period must be separated from the date by a semi-colon `;`. Then write -a `+` or a `-` to add or substract and at last the number of days -(suffix `d`) or months (suffix `m`). +Dates can be defined by providing a base date via the forms above and +a period to add or substract. This is especially useful with the +`today` pattern. The period must be separated from the date by a +semi-colon `;`. Then write a `+` or a `-` to add or substract and at +last the number of days (suffix `d`) or months (suffix `m`). Examples: `today;-14d`, `2020-02;+1m` + # Simple Expressions Simple expressions are made up of a field with at most one value, an @@ -201,24 +203,29 @@ except for boolean fields. The like operator `:` can be used with all values, but makes only sense for text values. It allows to do a substring search for a field. -For example, to look for an item with a name of exactly 'invoice_22': + +For example, this looks for an item with a name of exactly +'invoice_22': ``` name=invoice_22 ``` -Using `:` it is possible to look for items that have 'invoice' in -their name: +By using `:`, it is possible to look for items that have 'invoice' +somewhere in their name: ``` name:*invoice* ``` The asterisk `*` can be added at the beginning and/or end of the -value. Furthermore, the like operator is case-insensitive, whereas `=` -is not. This applies to all fields with a text value; this is another -example looking for a correspondent person of with 'marcus' in the -name: +value, but not in betwee. Furthermore, the like operator is +case-insensitive, whereas `=` is not. This applies to all fields with +a text value. + +This is another example looking for a correspondent person of with +'marcus' in the name: + ``` corr.pers.name:*marcus* ``` @@ -233,9 +240,9 @@ operators don't make sense and therefore don't work there. ---- All these fields (except boolean fields) allow to use the in-operator, -`~=`. This is a more efficient form to specify a list of alternatives -and is logically the same as combining multiple expressions with -`OR`. For example: +`~=`. This is a more efficient form to specify a list of alternative +values for the same field. It is logically the same as combining +multiple expressions with `OR`. For example: ``` source~=webapp,mailbox @@ -287,9 +294,9 @@ incoming:no # Tags -Tags have their own syntax, because they are an important tool for -organizing items. Tags only allow for two operators: `=` and `:`. -Combined with negation (the `!` operator), this is quite flexible. +Tags have their own syntax, because they can appear multiple times on +an item. Tags only allow for two operators: `=` and `:`. Combined with +negation (the `!` operator), this is quite flexible. For tags, `=` means that items must have *all* specified tags (or more), while `:` means that items must have at least *one* of the @@ -298,12 +305,14 @@ given as a comma separated list (just like when using the in-operator). Some examples: Find all invoices that are todo: + ``` tag=invoice,todo ``` -This returns all items that have tags `invoice` and `todo` – and -possible some other tags. Negating this: +This returns all items that have both tags `invoice` and `todo`. +Negating this: + ``` !tag=invoice,todo ``` @@ -332,9 +341,10 @@ instead of `tag`. The field `cat` can be used the same way to search for tag categories. + # Custom Fields -Custom fields are implemented via the following syntax: +Custom fields can be used via the following syntax: ``` f: @@ -378,8 +388,8 @@ f.id:J2ES1Z4Ni9W-xw1VdFbt3KA-rL725kuyVzh-7La95Yw7Ax2:15.00 # Fulltext Search The special field `content` allows to add a fulltext search. Using -this is currently restricted: it must occur in the root query and -cannot be nested in other complex expressions. +this is currently restricted: it must occur in the root (AND) query +and cannot be nested in other complex expressions. The form is: @@ -396,11 +406,19 @@ For example, do a fulltext search for 'red needle': content:"red needle" ``` -It can be combined in an AND expression (but not deeper): +It can be combined in an AND expression: + ``` content:"red needle" tag:todo ``` +But it can't be combined via OR. This is not possible: + +``` +tag:todo (| content:"red needle" tag:waiting) +``` + + # File Checksums @@ -419,16 +437,17 @@ checksum:40675c22ab035b8a4ffe760732b65e5f1d452c59b44d3d0a2a08a95d28853497 # Exist The `exist` field can be used with another field, to check whether an -item has some value for a given field. It only works for fields that -have at most one value. +item has some value for it. It only works for fields that have at most +one value. -For example, it could be used to find fields that are in any folder: +For example, it could be used to find items that are in any folder: ``` exist:folder ``` When negating, it finds all items that are not in a folder: + ``` !exist:folder ``` diff --git a/website/site/content/docs/webapp/autotagging.md b/website/site/content/docs/webapp/autotagging.md new file mode 100644 index 00000000..b2e91b36 --- /dev/null +++ b/website/site/content/docs/webapp/autotagging.md @@ -0,0 +1,33 @@ ++++ +title = "Auto Tagging" +weight = 90 +[extra] +mktoc = true ++++ + + +Auto-Tagging must be enabled in the collective profile. Docspell can +go through your items periodically and learn from your existing tags. +But not all tags are suited for learning. Docspell can only learn +relationships between tags and the document's extracted text. Thus, +all tags that don't relate to the contents of a documents, should be +excluded. + +For example, assume there is a tag `Done` that is associated to all +items that have been worked on. Over time, most of the items have this +tag. Whether an item is tagged with `Done` or not cannot be well +determined by looking at the text of the document. It would mean that +Docspell could learn relationships that are not correct and then tag +the next incoming items with `Done`. + +{{ figure(file="collective-settings-autotag.png") }} + +That is why you need to specify what tags to learn. This is done by +defining whitelist or a blacklist of tag categories. When defining a +whitelist, then only tags in these categories are selected for +learning. When defining a blacklist, all tags *except* the one in the +list are chosen for learning. + +The *Schedule* allows to define at what intervals tags should be +learned. When clicking the *Start Now* button, the task is submitted +immediately. diff --git a/website/site/content/docs/webapp/collective-settings-autotag.png b/website/site/content/docs/webapp/collective-settings-autotag.png new file mode 100644 index 00000000..2999afbc Binary files /dev/null and b/website/site/content/docs/webapp/collective-settings-autotag.png differ diff --git a/website/site/content/docs/webapp/finding.md b/website/site/content/docs/webapp/finding.md index 88bc1fef..f100d8e2 100644 --- a/website/site/content/docs/webapp/finding.md +++ b/website/site/content/docs/webapp/finding.md @@ -18,6 +18,10 @@ The search bar let's you search in item and attachment names names and do fulltext search. The icon next to the search field can switch between these modes. +In the user profile, you can switch this search bar to "power search" +mode. This allows then to enter [complex +queries](@/docs/query/_index.md). + ## The *Names* option {#names} This searches in the item name, names of correspondent organization