mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-04-02 09:05:08 +00:00
Update docs
This commit is contained in:
parent
177488817d
commit
8d539e138c
@ -405,7 +405,30 @@ allows comments and has some [advanced
|
||||
features](https://github.com/lightbend/config#features-of-hocon).
|
||||
Please refer to their documentation for more on this.
|
||||
|
||||
Here are the default configurations.
|
||||
A short description (please see the links for better understanding):
|
||||
The config consists of key-value pairs and can be written in a
|
||||
JSON-like format (called HOCON). Keys are organized in trees, and a
|
||||
key defines a full path into the tree. There are two ways:
|
||||
|
||||
```
|
||||
a.b.c.d=15
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
a {
|
||||
b {
|
||||
c {
|
||||
d = 15
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Both are exactly the same and these forms are both used at the same
|
||||
time. Usually the braces approach is used to group some more settings,
|
||||
for better readability.
|
||||
|
||||
|
||||
# Default Config
|
||||
|
@ -114,6 +114,11 @@ Currently there exists a bash script to import files and metadata from
|
||||
[Paperless](https://github.com/the-paperless-project/paperless/).
|
||||
Please see this [issue](https://github.com/eikek/docspell/issues/358).
|
||||
|
||||
## Why another DMS?
|
||||
|
||||
Back when Docspell started, there weren't as many options as there are
|
||||
now. I wanted to try out a different approach. You can read more about
|
||||
that [here](@/docs/intro/_index.md#rationale).
|
||||
|
||||
## Wh…?
|
||||
|
||||
|
@ -11,26 +11,22 @@ mktoc = true
|
||||
|
||||
Docspell aims to be a simple yet effective document organizer that
|
||||
makes stowing documents away very quick and finding them later
|
||||
reliable (and also fast). It doesn't require technical background or
|
||||
studying huge manuals in order to use it. With this in mind, it is
|
||||
rather opinionated and more targeted for home use and small/medium
|
||||
organizations.
|
||||
reliable (and also fast). It is a bit opinionated and more targeted
|
||||
for home use and small/medium organizations.
|
||||
|
||||
In contrast to many DMS, the main focus is not so much to provide all
|
||||
kinds of features to manually create organizational structures, like
|
||||
folder hierarchies, where you place the documents yourself. The
|
||||
approach is more to leave it as a big pile of documents, but extract
|
||||
and attach metadata from each document. These are mainly properties
|
||||
that emerge from the document itself. The reason is that this is
|
||||
possible to automate and that many custom folder structures include
|
||||
these metadata somewhere, too. This makes it very simple to *add*
|
||||
documents, because there is no time spent to think about where to put
|
||||
it. And it is possible to apply different structures on top later,
|
||||
like show first all documents of a specific correspondent, then all
|
||||
with tag 'invoice', etc. If these properties are attached to all
|
||||
documents, it is really easy to find a document. It even can be
|
||||
combined with fulltext search for the, hopefully rare, desperate
|
||||
cases.
|
||||
approach is to leave it as a big pile of documents, but extract and
|
||||
attach metadata from each document. These are mainly properties that
|
||||
emerge from the document itself. The reason is that this is possible
|
||||
to automate. This makes it very simple to *add* documents, because
|
||||
there is no time spent to think about where to put it. And it is
|
||||
possible to apply different structures on top later, like show first
|
||||
all documents of a specific correspondent, then all with tag
|
||||
'invoice', etc. If these properties are attached to all documents, it
|
||||
is really easy to find a document. It even can be combined with
|
||||
fulltext search for the, hopefully rare, desperate cases.
|
||||
|
||||
Of course, it is also possible to add custom properties and arbitrary
|
||||
tags.
|
||||
@ -153,3 +149,115 @@ create a folder and associate members. It is possible to put items in
|
||||
these folders and docspell shows only items that are either in no
|
||||
specific folder or in a folder where the current user is owner or
|
||||
member.
|
||||
|
||||
# Rationale
|
||||
|
||||
In 2019, I started to think about creating a dms-like tool that is now
|
||||
Docspell. It started at the end of that year with the initial version,
|
||||
including the very basic idea around which I want to create some kind
|
||||
of document management system.
|
||||
|
||||
The following anecdote summarizes why I thought yet another dms-like
|
||||
tool might be useful.
|
||||
|
||||
I tried some DMS at that time, to see whether they could help me with
|
||||
the ever growing pile of documents. It's not just postal mail, now it
|
||||
gets mixed with invoices via e-mail, bank statements I need to
|
||||
download at some "portal" etc. It's all getting a huge mess. When
|
||||
looking for a specific document, it's hard to find.
|
||||
|
||||
I found all the enterprisy DMS are way above of what I need. They are
|
||||
rather difficult to setup and very hard to explain to non-technical
|
||||
people. They offer a lot of features and there is quite some time
|
||||
required to extract what's needed. I then discovered tools, that seem
|
||||
to better suite my needs. Their design were simple and very close to
|
||||
what I was looking for, making it a good fit for single user. There
|
||||
were only a few things to nag:
|
||||
|
||||
1. Often it was not possible to track multiple files as one "unit".
|
||||
For example: reports with accompanying pictures that I would like
|
||||
to treat as a single unit. It also more naturally fits to the
|
||||
common e-mail.
|
||||
2. Missing good multi-user support; and/or a simple enough interface
|
||||
so that non-technical users can also make sense of it.
|
||||
3. Missing some features important to me, like "send this by mail", a
|
||||
full REST api, and some more
|
||||
4. still a lot of "manually" organizing documents
|
||||
|
||||
These are not big complaints, they are solvable somehow. I want to
|
||||
focus on the last point: most systems didn't offer help with
|
||||
organizing the documents. I didn't find any, that included basic
|
||||
machine learning features. On most systems it was possible to organize
|
||||
documents into a custom folder structure. But it was all manually. You
|
||||
would need to move incoming documents into some subfolder. Some
|
||||
systems offered rules that get applied to documents in order to put
|
||||
them into the right place. Many offered tags, too, which relieves some
|
||||
of weight of this text. But they were also all manual. So the idea
|
||||
came to let the computer do a little more to help organize documents.
|
||||
|
||||
Let's start with the rules approach: A rule may look like this:
|
||||
|
||||
> when the document contains a text 'invoice' and 'repair company x',
|
||||
> then put it in subfolder B".
|
||||
|
||||
This rule can be applied to all the new documents to get automatically
|
||||
placed into this subfolder. I think there are some drawbacks to this
|
||||
approach:
|
||||
|
||||
- rules may change over time. Then you either must re-apply them all
|
||||
to all documents or leave older ones where they are. If re-applying
|
||||
them, some documents may not be in places as before which can easily
|
||||
confuse coworkers.
|
||||
- these rules may interfere with each other, then it might get more
|
||||
difficult to know where a document is
|
||||
- rules can become complex, be comprised of regular expressions, which
|
||||
are really only suited to technical people and need to be
|
||||
maintained.
|
||||
|
||||
I decided to try out a different approach: a "search-only" one¹.
|
||||
Instead of using a manual created folder structure, I simply search
|
||||
every time using this rule. In essence such a rule is a search query.
|
||||
But searching with rules like the one above is not very efficient. One
|
||||
would need to do fulltext searches, even extracting dates "on the fly"
|
||||
etc. It wouldn't be very reliable either. That's why documents have
|
||||
properties (called metadata). In my case most of them have a
|
||||
correspondent, a date and so on. If these properties were defined on
|
||||
documents, the queries become quite efficient. The idea is now, not to
|
||||
use rules for moving documents to some place, but for attaching
|
||||
properties, information, to each document. This solves a few issues:
|
||||
they can't get easily out of sync, and they can't interfere. Then
|
||||
docspell can help with finding some of these properties automatically.
|
||||
For example: it can propose properties by looking at the text. It can
|
||||
also take existing documents into account when suggesting tags. In
|
||||
docspell, it is not possible to define custom rules, instead it tries
|
||||
to find these rules for you by looking at the text and your previous
|
||||
documents.
|
||||
|
||||
That said, there is still a manual process involved, but I found it
|
||||
much lighter. Once in a while, looking at new documents and confirming
|
||||
or fixing the metadata is necessary. This doesn't involve deciding for
|
||||
a place, though. What properties you are interested to track can be
|
||||
configured; should you only need a correspondent and a date,
|
||||
everything else can be hidden.
|
||||
|
||||
So in docspell, all documents are just in one big pile… but every
|
||||
document has metadata attached that can be used to quickly find what
|
||||
you need. There is no folder structure, but it is possible to later
|
||||
apply certain hierarchical structures. It would be possible to create
|
||||
a "folder structure", like the one mentioned above: click on
|
||||
correspondent `repair company x`; then on tag `invoice`, then
|
||||
`concerning=car` and `year=2019`. A UI could be created to present
|
||||
exactly this hierarchy. Since I can't know your preferred structure
|
||||
(not even my own…!), the docspell ui allows every combination,
|
||||
regardless any hierarchies. You can first select a correspondent, then
|
||||
a tag or the other way around. Usually it's not necessary to go very
|
||||
deep.
|
||||
|
||||
That's all about it! I thought why not try this approach and at the
|
||||
same time learn about some technologies around. In the last year,
|
||||
docspell evolved to a quite usable tool, imho. This was only possible,
|
||||
because very nice people gave valueable feedback and ideas!
|
||||
|
||||
|
||||
¹This is inspired by tools like
|
||||
[mu](https://www.djcbsoftware.nl/code/mu/) and GMail.
|
||||
|
@ -9,7 +9,7 @@ mktoc = true
|
||||
|
||||
|
||||
Docspell uses a query language to provide a powerful way to search for
|
||||
your documents. It is targeted at advanced users and it needs to be
|
||||
your documents. It is targeted at "power users" and it needs to be
|
||||
enabled explicitely in your user settings.
|
||||
|
||||
<div class="colums">
|
||||
@ -62,7 +62,7 @@ There are 7 operators:
|
||||
- `>` for greater-than
|
||||
- `>=` for greater-equals
|
||||
- `~=` for "in" (a shorter way to say "a or b or c or d")
|
||||
- `:` for "like"
|
||||
- `:` for "like", this is used in a context-sensitive way
|
||||
- `<` for lower than
|
||||
- `<=` for lower-equal
|
||||
- `!=` for not-equals
|
||||
@ -76,7 +76,7 @@ what operators are allowed. There are fields where an item can have at
|
||||
most one value (like `name` or `notes`) and there are fields where an
|
||||
item can have multiple values (like `tag`). At last there are special
|
||||
fields that are either implemented directly using custom sql or that
|
||||
are shortcuts to a longer form.
|
||||
are only shortcuts to a longer form.
|
||||
|
||||
Here is the list of all available fields.
|
||||
|
||||
@ -104,7 +104,7 @@ These fields map to at most one value:
|
||||
show only incoming, `false` to show only outgoing.
|
||||
|
||||
These fields support all operators, except `incoming` and `inbox`
|
||||
which expect boolean values and there these operators don't make much
|
||||
which expect boolean values and for those some operators don't make
|
||||
sense.
|
||||
|
||||
Fields that map to more than one value:
|
||||
@ -117,18 +117,18 @@ The tag and category fields use two operators: `:` and `=`.
|
||||
|
||||
Other special fields:
|
||||
|
||||
- `attach.id`
|
||||
- `checksum`
|
||||
- `content`
|
||||
- `attach.id` references the id of an attachment
|
||||
- `checksum` references the sha256 checksum of a file
|
||||
- `content` for fulltext search
|
||||
- `f` for referencing custom fields by name
|
||||
- `f.id` for referencing custom fields by their id
|
||||
- `dateIn` a shortcut for a range search
|
||||
- `dueIn` a shortcut for a range search
|
||||
- `exist` check if some porperty exists
|
||||
- `names`
|
||||
- `year`
|
||||
- `conc`
|
||||
- `corr`
|
||||
- `names` a shortcut to search in several names via `:`
|
||||
- `year` a shortcut for a year range
|
||||
- `conc` a shortcut for concerning person and equipment names
|
||||
- `corr` a shortcut for correspondent org and person names
|
||||
|
||||
These fields are often using the `:` operator to simply separate field
|
||||
and value. They are often backed by a custom implementation, or they
|
||||
@ -137,8 +137,8 @@ are shortcuts for a longer query.
|
||||
## Values
|
||||
|
||||
Values are the data you want to search for. There are different kinds
|
||||
of that, too: there are text-based values, numbers, boolean and dates.
|
||||
When multiple values are allowed, they must be separated by comma `,`.
|
||||
of that, too: there are text values, numbers, boolean and dates. When
|
||||
multiple values are allowed, they must be separated by comma `,`.
|
||||
|
||||
### Text Values
|
||||
|
||||
@ -152,6 +152,7 @@ these characters:
|
||||
- parens `()`
|
||||
|
||||
Any quotes inside a quoted string must be escaped with a backslash.
|
||||
|
||||
Examples: `scan_123`, `a-b-c`, `x.y.z`, `"scan from today"`, `"a \"strange\"
|
||||
name.pdf"`
|
||||
|
||||
@ -185,14 +186,15 @@ prefixed by `ms`. The time part is ignored. Examples:
|
||||
|
||||
#### Calculation
|
||||
|
||||
Dates can be defined by providing a base date and a period to add or
|
||||
substract. This is especially useful with the `today` pattern. The
|
||||
period must be separated from the date by a semi-colon `;`. Then write
|
||||
a `+` or a `-` to add or substract and at last the number of days
|
||||
(suffix `d`) or months (suffix `m`).
|
||||
Dates can be defined by providing a base date via the forms above and
|
||||
a period to add or substract. This is especially useful with the
|
||||
`today` pattern. The period must be separated from the date by a
|
||||
semi-colon `;`. Then write a `+` or a `-` to add or substract and at
|
||||
last the number of days (suffix `d`) or months (suffix `m`).
|
||||
|
||||
Examples: `today;-14d`, `2020-02;+1m`
|
||||
|
||||
|
||||
# Simple Expressions
|
||||
|
||||
Simple expressions are made up of a field with at most one value, an
|
||||
@ -201,24 +203,29 @@ except for boolean fields.
|
||||
|
||||
The like operator `:` can be used with all values, but makes only
|
||||
sense for text values. It allows to do a substring search for a field.
|
||||
For example, to look for an item with a name of exactly 'invoice_22':
|
||||
|
||||
For example, this looks for an item with a name of exactly
|
||||
'invoice_22':
|
||||
|
||||
```
|
||||
name=invoice_22
|
||||
```
|
||||
|
||||
Using `:` it is possible to look for items that have 'invoice' in
|
||||
their name:
|
||||
By using `:`, it is possible to look for items that have 'invoice'
|
||||
somewhere in their name:
|
||||
|
||||
```
|
||||
name:*invoice*
|
||||
```
|
||||
|
||||
The asterisk `*` can be added at the beginning and/or end of the
|
||||
value. Furthermore, the like operator is case-insensitive, whereas `=`
|
||||
is not. This applies to all fields with a text value; this is another
|
||||
example looking for a correspondent person of with 'marcus' in the
|
||||
name:
|
||||
value, but not in betwee. Furthermore, the like operator is
|
||||
case-insensitive, whereas `=` is not. This applies to all fields with
|
||||
a text value.
|
||||
|
||||
This is another example looking for a correspondent person of with
|
||||
'marcus' in the name:
|
||||
|
||||
```
|
||||
corr.pers.name:*marcus*
|
||||
```
|
||||
@ -233,9 +240,9 @@ operators don't make sense and therefore don't work there.
|
||||
----
|
||||
|
||||
All these fields (except boolean fields) allow to use the in-operator,
|
||||
`~=`. This is a more efficient form to specify a list of alternatives
|
||||
and is logically the same as combining multiple expressions with
|
||||
`OR`. For example:
|
||||
`~=`. This is a more efficient form to specify a list of alternative
|
||||
values for the same field. It is logically the same as combining
|
||||
multiple expressions with `OR`. For example:
|
||||
|
||||
```
|
||||
source~=webapp,mailbox
|
||||
@ -287,9 +294,9 @@ incoming:no
|
||||
|
||||
# Tags
|
||||
|
||||
Tags have their own syntax, because they are an important tool for
|
||||
organizing items. Tags only allow for two operators: `=` and `:`.
|
||||
Combined with negation (the `!` operator), this is quite flexible.
|
||||
Tags have their own syntax, because they can appear multiple times on
|
||||
an item. Tags only allow for two operators: `=` and `:`. Combined with
|
||||
negation (the `!` operator), this is quite flexible.
|
||||
|
||||
For tags, `=` means that items must have *all* specified tags (or
|
||||
more), while `:` means that items must have at least *one* of the
|
||||
@ -298,12 +305,14 @@ given as a comma separated list (just like when using the
|
||||
in-operator).
|
||||
|
||||
Some examples: Find all invoices that are todo:
|
||||
|
||||
```
|
||||
tag=invoice,todo
|
||||
```
|
||||
|
||||
This returns all items that have tags `invoice` and `todo` – and
|
||||
possible some other tags. Negating this:
|
||||
This returns all items that have both tags `invoice` and `todo`.
|
||||
Negating this:
|
||||
|
||||
```
|
||||
!tag=invoice,todo
|
||||
```
|
||||
@ -332,9 +341,10 @@ instead of `tag`.
|
||||
|
||||
The field `cat` can be used the same way to search for tag categories.
|
||||
|
||||
|
||||
# Custom Fields
|
||||
|
||||
Custom fields are implemented via the following syntax:
|
||||
Custom fields can be used via the following syntax:
|
||||
|
||||
```
|
||||
f:<field-name><operator><value>
|
||||
@ -378,8 +388,8 @@ f.id:J2ES1Z4Ni9W-xw1VdFbt3KA-rL725kuyVzh-7La95Yw7Ax2:15.00
|
||||
# Fulltext Search
|
||||
|
||||
The special field `content` allows to add a fulltext search. Using
|
||||
this is currently restricted: it must occur in the root query and
|
||||
cannot be nested in other complex expressions.
|
||||
this is currently restricted: it must occur in the root (AND) query
|
||||
and cannot be nested in other complex expressions.
|
||||
|
||||
The form is:
|
||||
|
||||
@ -396,11 +406,19 @@ For example, do a fulltext search for 'red needle':
|
||||
content:"red needle"
|
||||
```
|
||||
|
||||
It can be combined in an AND expression (but not deeper):
|
||||
It can be combined in an AND expression:
|
||||
|
||||
```
|
||||
content:"red needle" tag:todo
|
||||
```
|
||||
|
||||
But it can't be combined via OR. This is not possible:
|
||||
|
||||
```
|
||||
tag:todo (| content:"red needle" tag:waiting)
|
||||
```
|
||||
|
||||
|
||||
|
||||
# File Checksums
|
||||
|
||||
@ -419,16 +437,17 @@ checksum:40675c22ab035b8a4ffe760732b65e5f1d452c59b44d3d0a2a08a95d28853497
|
||||
# Exist
|
||||
|
||||
The `exist` field can be used with another field, to check whether an
|
||||
item has some value for a given field. It only works for fields that
|
||||
have at most one value.
|
||||
item has some value for it. It only works for fields that have at most
|
||||
one value.
|
||||
|
||||
For example, it could be used to find fields that are in any folder:
|
||||
For example, it could be used to find items that are in any folder:
|
||||
|
||||
```
|
||||
exist:folder
|
||||
```
|
||||
|
||||
When negating, it finds all items that are not in a folder:
|
||||
|
||||
```
|
||||
!exist:folder
|
||||
```
|
||||
|
33
website/site/content/docs/webapp/autotagging.md
Normal file
33
website/site/content/docs/webapp/autotagging.md
Normal file
@ -0,0 +1,33 @@
|
||||
+++
|
||||
title = "Auto Tagging"
|
||||
weight = 90
|
||||
[extra]
|
||||
mktoc = true
|
||||
+++
|
||||
|
||||
|
||||
Auto-Tagging must be enabled in the collective profile. Docspell can
|
||||
go through your items periodically and learn from your existing tags.
|
||||
But not all tags are suited for learning. Docspell can only learn
|
||||
relationships between tags and the document's extracted text. Thus,
|
||||
all tags that don't relate to the contents of a documents, should be
|
||||
excluded.
|
||||
|
||||
For example, assume there is a tag `Done` that is associated to all
|
||||
items that have been worked on. Over time, most of the items have this
|
||||
tag. Whether an item is tagged with `Done` or not cannot be well
|
||||
determined by looking at the text of the document. It would mean that
|
||||
Docspell could learn relationships that are not correct and then tag
|
||||
the next incoming items with `Done`.
|
||||
|
||||
{{ figure(file="collective-settings-autotag.png") }}
|
||||
|
||||
That is why you need to specify what tags to learn. This is done by
|
||||
defining whitelist or a blacklist of tag categories. When defining a
|
||||
whitelist, then only tags in these categories are selected for
|
||||
learning. When defining a blacklist, all tags *except* the one in the
|
||||
list are chosen for learning.
|
||||
|
||||
The *Schedule* allows to define at what intervals tags should be
|
||||
learned. When clicking the *Start Now* button, the task is submitted
|
||||
immediately.
|
BIN
website/site/content/docs/webapp/collective-settings-autotag.png
Normal file
BIN
website/site/content/docs/webapp/collective-settings-autotag.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 185 KiB |
@ -18,6 +18,10 @@ The search bar let's you search in item and attachment names names and
|
||||
do fulltext search. The icon next to the search field can switch
|
||||
between these modes.
|
||||
|
||||
In the user profile, you can switch this search bar to "power search"
|
||||
mode. This allows then to enter [complex
|
||||
queries](@/docs/query/_index.md).
|
||||
|
||||
## The *Names* option {#names}
|
||||
|
||||
This searches in the item name, names of correspondent organization
|
||||
|
Loading…
x
Reference in New Issue
Block a user