Initial website

This commit is contained in:
Eike Kettner
2020-07-27 22:13:22 +02:00
parent dbd0f3ff97
commit f8c6f79b10
160 changed files with 8854 additions and 64 deletions

View File

@ -0,0 +1,12 @@
+++
title = "Web-UI"
summary = true
description = "This section describes the features of the web application."
weight = 50
insert_anchor_links = "right"
template = "pages.html"
sort_by = "weight"
redirect_to = "docs/webapp/uploading"
+++
No content here.

View File

@ -0,0 +1,66 @@
+++
title = "Curate Items"
weight = 20
+++
Curating the items meta data helps finding them later. This page
describes how you can quickly go through those items and correct or
amend with existing data.
## Select New items
After files have been uploaded and the job executor created the
corresponding items, they will show up on the main page. All items,
the job executor has created are initially marked as *New*. The option
*only New* in the left search menu can be used to select only new
items:
{{ figure(file="docspell-curate-1.jpg") }}
## Check selected items
Then you can go through all new items and check their metadata: Click
on the first item to open the detail view. This shows the documents
and the meta data in the header.
{{ figure(file="docspell-curate-2.jpg") }}
## Modify if necessary
To change something, click the *Edit* button in the menu above the
document view. This will open a form next to your documents. You can
compare the data with the documents and change as you like. Since the
item status is *New*, you'll see the suggestions docspell found during
processing. If there were multiple candidates, you can select another
one by clicking its name in the suggestion list.
{{ figure(file="docspell-curate-3.jpg") }}
When you change something in the form, it is immediatly applied. Only
when changing text fields, a click on the *Save* symbol next to the
field is required.
## Confirm
If everything looks good, click the *Confirm* button to confirm the
current data. The *New* status goes away and also the suggestions are
hidden in this state. You can always go back by clicking the
*Unconfirm* button.
{{ figure(file="docspell-curate-5.jpg") }}
## Proceed with next item
To look at the next item in the search results, click the *Next*
button in the menu (next to the *Edit* button). Clicking next, will
keep the current view, so you can continue checking the data. If you
are on the last item, the view switches to the listing view when
clicking *Next*.
{{ figure(file="docspell-curate-6.jpg") }}

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

View File

@ -0,0 +1,234 @@
+++
title = "E-Mail Settings"
weight = 40
[extra]
mktoc = true
+++
Docspell has a good integration for E-Mail. You can send e-mails
related to an item and you can import e-mails from your mailbox into
docspell.
This requires to define settings to use for sending and receiving
e-mails. E-Mails are commonly send via
[SMTP](https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol)
and for receiving
[IMAP](https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol)
is quite common. Docspell has support for SMTP and IMAP. These
settings are associated to a user, so that each user can specify its
own settings separately from others in the collective.
*Note: Passwords to your e-mail accounts are stored in plain-text in
docspell's database. This is necessary to have docspell connect to
your e-mail account to send mails on behalf of you and receive your
mails.*
## SMTP Settings
For sending mail, you need to provide information to connect to a SMTP
server. Every e-mail provider has this information somewhere
available.
Configure this in *User Settings -> E-Mail Settings (SMTP)*:
{{ figure(file="mail-settings-1.png") }}
First, you need to provide some name that is used to recognize this
account. This name is also used in URLs to docspell and so it must not
contain whitespace or any special characters. A good value is the
domain of your provider, for example `gmail.com`, or something like
that.
These information should be available from your e-mail provider. For
example, for google-mail it is:
- SMTP Host: `smtp.gmail.com`
- SMTP Port: `587` or `465`
- SMTP User: Your Gmail address (for example, example@gmail.com)
- SMTP Password: Your Gmail password
- SSL: use `SSL` for port `465` and `StartSSL` for port `587`
Then you need to define the e-mail address that is used for the `From`
field. This is in most cases the same address as used for the SMTP
User field.
The `Reply-To` field is optional and can be set to define a different
e-mail address that your recipients should use to answer a mail.
Once this is setup, you can start sending mails within docspell. It is
possible to set up these settings for multiple providers, so you can
choose from which account you want to send mails.
## IMAP Settings
For receiving e-mails, you need to provide information to connect to
an IMAP server. Your e-mail provider should have this information
somewhere available.
Configure this in *User Settings -> E-Mail Settings (IMAP)*:
{{ figure(file="mail-settings-2.png") }}
First you need to define a *Name* to recognize this connection inside
docspell. This name is also used in URLs to docspell and so it must
not contain whitespace or any special characters. A good value is the
domain of your provider, for example `gmail.com`, or something like
that.
You can provide imap connections to multiple mailboxes.
Here is an example for posteo.de:
- IMAP Server: `posteo.de`
- IMAP Port: 143
- IMAP User: Your posteo address
- IMAP Password: Your posteo password
- SSL: use `StartTLS`
## SSL / TLS / StartTLS
*Please Note: If `SSL` is set to `None`, then mails will be sent
unencrypted to your mail provider! If `Ignore certificate check` is
enabled, connections to your mail provider will succeed even if the
provider is wrongly configured for SSL/TLS. This flag should only be
enabled if you know why.*
## GMail
Authenticating with GMail may be not so simple. GMail implements an
authentication scheme called *XOAUTH2* (at least for Imap). It will
not work with your normal password. This is to avoid giving an
application full access to your gmail account.
The e-mail integration in docspell relies on the
[JavaMail](https://javaee.github.io/javamail) library which has
support for XOAUTH2. It also has documentation on what you need to do
on your gmail account: <https://javaee.github.io/javamail/OAuth2>.
First you need to go to the [Google Developers
Console](https://console.developers.google.com) and create an "App" to
get a Client-Id and a Client-Secret. This "App" will be your instance
of docspell. You tell google that this app may send and read your
mails and then you get an *access token* that should be used instead
of the password.
Once you setup an App in Google Developers Console, you get the
Client-Id and the Client-Secret, which look something like this:
- Client-Id: 106701....d8c.apps.googleusercontent.com
- Client-Secret: 5Z1...Kir_t
Google has a python tool to help with getting this access token.
Download the `oauth2.py` script from
[here](https://github.com/google/gmail-oauth2-tools) and first create
an *oauth2-token*:
``` bash
./oauth2.py --user=your.name@gmail.com \
--client_id=106701....d8c.apps.googleusercontent.com \
--client_secret=5Z1...Kir_t \
--generate_oauth2_token
```
This will "redirect you" to an URL where you have to authenticate with
google. Afterwards it lets you add permissions to the app for
accessing your mail account. The result is another code you need to
give to the script to proceed:
```
4/zwE....q0QBAb-99yD7lw
```
Then the scripts produces this:
```
Refresh Token: 1//09zH.........Lj6oc2SmFlZww
Access Token: ya29.a0........SECDQ
Access Token Expiration Seconds: 3599
```
The access token can be used to sign in via IMAP with google. The
Refresh Token doesn't expire and can be used to generate new access
tokens:
```
./oauth2.py --user=your.name@gmail.com \
--client_id=106701....d8c.apps.googleusercontent.com \
--client_secret=5Z1...Kir_t \
--refresh_token=1//09zH.........Lj6oc2SmFlZww
```
Output:
```
Access Token: ya29.a0....._q-lX3ypntk3ln0h9Yk
Access Token Expiration Seconds: 3599
```
The problem is that the access token expires. Docspell doesn't support
updating the access token. It could be worked around by setting up a
cron-job or similiar which uses the `oauth2.py` tool to generate new
access tokens and update your imap settings via a
[REST](@/docs/api/_index.md) call.
``` bash
#!/usr/bin/env bash
set -e
## Change this to your values:
DOCSPELL_USER="[docspell-user]"
DOCSPELL_PASSWORD="[docspell-password]"
DOCSPELL_URL="http://localhost:7880"
DOCSPELL_IMAP_NAME="gmail.com"
GMAIL_USER="your.name@gmail.com"
CLIENT_ID="106701....d8c.apps.googleusercontent.com"
CLIENT_SECRET="secret=5Z1...Kir_t"
REFRESH_TOKEN="1//09zH.........Lj6oc2SmFlZww"
# Path to the oauth2.py tool
OAUTH_TOOL="./oauth2.py"
##############################################################################
## Script
# Login to docspell and store the auth-token
AUTH_DATA=$(curl --silent -XPOST \
-H 'Content-Type: application/json' \
--data-binary "{\"account\":\"$DOCSPELL_USER\",\"password\":\"$DOCSPELL_PASSWORD\"}" \
$DOCSPELL_URL/api/v1/open/auth/login)
if [ $(echo $AUTH_DATA | jq .success) == "false" ]; then
echo "Auth failed"
echo $AUTH_DATA
fi
TOKEN="$(echo $AUTH_DATA | jq -r .token)"
# Get the imap settings
UPDATE_URL="$DOCSPELL_URL/api/v1/sec/email/settings/imap/$DOCSPELL_IMAP_NAME"
IMAP_DATA=$(curl -s -H "X-Docspell-Auth: $TOKEN" "$UPDATE_URL")
echo "Current Settings:"
echo $IMAP_DATA | jq
# Get the new access token
ACCESS_TOKEN=$($OAUTH_TOOL --user=$GMAIL_USER \
--client_id="$CLIENT_ID" \
--client_secret="$CLIENT_SECRET" \
--refresh_token="$REFRESH_TOKEN" | head -n1 | cut -d':' -f2 | xargs)
# Update settings
echo "Updating IMAP settings"
NEW_IMAP=$(echo $IMAP_DATA | jq ".imapPassword |= \"$ACCESS_TOKEN\"")
curl -s -XPUT -H "X-Docspell-Auth: $TOKEN" \
-H 'Content-Type: application/json' \
--data-binary "$NEW_IMAP" "$UPDATE_URL"
echo
echo "New Settings:"
curl -s -H "X-Docspell-Auth: $TOKEN" "$UPDATE_URL" | jq
```

View File

@ -0,0 +1,178 @@
+++
title = "Finding Items"
weight = 30
[extra]
mktoc = true
+++
Items can be searched by their annotated meta data and their contents
using full text search. The landing page shows a list of current
items. Items are displayed sorted by their date, newest first.
Docspell has two modes for searching: a simple search bar and a search
menu with many options. Both are active at the same time, but only one
is visible. You can switch between them without affecting the results.
## Search Bar
{{ imgright(file="search-bar.png") }}
By default, the search bar is shown. It provides a refined view of the
search menu. The dropdown contains different options to do a quick
search.
### *All Names* and *Contents*
These two options correspond to the same named field in the search
menu. If you switch between search menu and search bar (by clicking
the icon on the left), you'll see that they are the same fields.
Typing in the search bar also fills the corresponding field in the
search menu (and vice versa).
- The *All Names* searches in the item name, item notes, names of
correspondent organization and person, and names of concering person
and equipment. It uses a simple substring search.
- The option *Contents* searches the contents of all attachments
(documents), attachment names, the item name and item notes. It uses
full text search. However, it does not search the names of attached
meta data.
When searching with one of these fields active, it simply submits the
(hidden) search menu. So if the menu has other fields filled out, they
will affect the result, too. Using one of these fields, the bar is
just a reduced view of the search menu.
So you can choose tags or correspondents in the search menu and
further restrict the results using full text search. The results will
be returned sorted by the item date, newest first.
If the left button in the search bar shows a little blue bubble, it
means that there are more search fields filled out in the search menu
that you currently can't see. In this case the results are not only
restricted by the search term given in the search-bar, but also by
what is specified in the search menu.
### *Contents Only*
This option has no corresponding part in the search menu. Searching
with this option active, there is only a full text search done in the
attachments contents, attachment names, item name and item notes.
The results are not ordered by item date, but by relevance with
respect to the search term. This ordering is returned from the full
text search engine and is simply transfered unmodified.
## Search Menu
{{ imgright(file="search-menu.png") }}
The search menu can be opened by clicking the left icon in the top
bar. It shows some options to constrain the item list:
### Show new items
Clicking the checkbox "Only new" shows items that have not been
"Confirmed". All items that have been created by docspell and not
looked at are marked as "new" automatically.
### Names
Searches in names of certain properties. The `All Names` field is the
same as the search in the search bar (see above).
The `Name` field only searches in the name property of an item.
### Folder
Set a folder to only show items in that folder. If no folder is set,
all accessible items are shown. These are all items that either have
no folder set, or a folder where the current user is member.
### Tags
Specify a list of tags that the items must have. When adding tags to
the "Include" list, an item must have all these tags in order to be
included in the results.
When adding tags to the "Exclude" list, then an item is removed from
the results if it has at least one of these tags.
### Correspondent
Pick a correspondent to show only these items.
### Concerned
Pick a concerned entity to show only these items.
### Date
Specify a date range to show only items whose date property is within
this range. If you want to see items of a specific day, choose the
same day for both fields.
For items that don't have an explicitly date property set, the created
date is used.
### Due Date
Specify a date range to show only items whose due date property is
within this range. Items without a due date are not shown.
### Direction
Specify whether to show only incoming, only outgoing or all items.
## Customize Substring Search
The substring search of the *All Names* and *Name* field can be
customized in the following way: A wildcard `*` can be used at the
start or end of a search term to do a substring match. A `*` means
"everything". So a term `*company` matches all names ending in
`company` and `*company*` matches all names containing the word
`company`. The matching is case insensitive.
Docspell adds a `*` to the front and end of a term automatically,
unless one of the following is true:
- The term already has a wildcard.
- The term is enclosed in quotes `"`.
## Full Text Search
### The Query
The query string for full text search is very powerful. Docspell
currently supports [Apache SOLR](https://lucene.apache.org/solr/) as
full text search backend, so you may want to have a look at their
[documentation on query
syntax](https://lucene.apache.org/solr/guide/8_4/query-syntax-and-parsing.html#query-syntax-and-parsing)
for a in depth guide.
- Wildcards: `?` matches any single character, `*` matches zero or
more characters
- Fuzzy search: Appending a `~` to a term, results in a fuzzy search
(search this term and similiar spelled ones)
- Proximity Search: Search for terms that "near" each other, again
using `~` appended to a search phrase. Example: `"cheese cake"~5`.
- Boosting: apply more weight to a term with `^`. Example: `cheese^4
cake` cheese is 4x more important.
Docspell will preprocess the search query to prepare a query for SOLR.
It will by default search all indexed fields, which are: attachment
contents, attachment names, item name and item notes.
### The Results
When using full text search, each item in the result list is annotated
with the highlighted occurrence of the match.
{{ figure(file="search-content-results.png") }}

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 177 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

View File

@ -0,0 +1,70 @@
+++
title = "Send items via E-Mail"
weight = 50
[extra]
mktoc = true
+++
You can send e-mails from within docspell attaching the files of an
item. This is useful to collaborate or share certain documents with
people outside docspell.
All sent mails are stored attached to the item.
## E-Mail Settings (SMTP)
To send mails, there are SMTP settings required. Please see the page
about [e-mail settings](@/docs/webapp/emailsettings.md#smtp-settings).
## Sending Mails
Currently, it is possible to send mails related to only one item. You
can define the mail body and docspell will add the attachments of an
item, or you may choose to send the mail without any attachments.
In the item detail view, click on the envelope icon to open the mail
form:
{{ figure(file="mail-item-1.jpg") }}
Then write the mail. Multiple recipients may be specified. The input
field shows completion proposals from all contacts in your address
book (from organizations and persons). Choose an address by pressing
*Enter* or by clicking a proposal from the list. The proposal list can
be iterated by the *Up* and *Down* arrows. You can type in any
address, of course, it doesn't need to match a proposal.
If you have multiple mail settings defined, you can choose in the top
dropdown which account to use for sending.
The last checkbox allows to choose whether docspell should add all
attachments of the item to the mail. If it is unchecked, no
attachments will be added. It is currently not possible to pick
specific attachments, it's all or nothing.
Clicking *Cancel* will delete the inputs and close the mail form, but
clicking the envelope icon again, will only close the form without
clearing its contents.
The *Send* button is active once all input fields have been filled.
Once you click *Send*, the docspell server will send the mail using
your connection settings. If that succeeds the mail is saved to the
database and you'll see a message in the form.
## Accessing Sent Mails
If there is an e-mail for an item, a tab shows up at the right side,
next to the attachments.
{{ figure(file="mail-item-2.jpg") }}
This tab shows a list of all mails that have been sent related to this
item.
{{ figure(file="mail-item-3.jpg") }}
Clicking on a mail opens it in detail.
{{ figure(file="mail-item-4.jpg") }}

View File

@ -0,0 +1,114 @@
+++
title = "Meta Data"
weight = 10
[extra]
mktoc = true
+++
Docspell processes each uploaded file. Processing involves extracting
archives, extracting text, anlyzing the extracted text and converting
the file into a pdf. Text is analyzed to find metadata that can be set
automatically. Docspell compares the extracted text against a set of
known meta data. The *Meta Data* page allows to manage this meta data:
- Tags
- Organizations
- Persons
- Equipments
- Folders
### Tags
Items can be tagged with multiple custom tags (aka labels). This
allows to describe many different workflows people may have with their
documents.
A tag can have a *category*. This is meant to group tags together. For
example, you may want to have a tag category *doctype* that is
comprised of tags like *bill*, *contract*, *receipt* and so on. Or for
workflows, a tag category *state* may exist that includes tags like
*Todo* or *Waiting*. Or you can tag items with user names to provide
"assignment" semantics. Docspell doesn't propose any workflow, but it
can help to implement some.
The tags are *not* taken into account when processing. Docspell will
not automatically associate tags to your items. The tags are only
meant to be used manually for now.
### Organization and Person
The organization entity represents an non-personal (organization or
company) correspondent of an item. Docspell will choose one or more
organizations when processing documents and associate the "best" match
with your item.
The person entitiy can appear in two roles: It may be a correspondent
or the person an item is about. So a person is either a correspondent
or a concerning person. Docspell can not know which person is which,
therefore you need to tell this by checking the box "Use for
concerning person suggestion only". If this is checked, docspell will
use this person only to suggest a concerning person. Otherwise the
person is used only for correspondent suggestions.
Document processing uses the following properties:
- name
- websites
- e-mails
The website and e-mails can be added as contact information. If these
three are present, you should get good matches from docspell. All
other fields of an organization and person are not used during
document processing. They might be useful when using this as a real
address book.
### Equipment
The equipment entity is almost like a tag. In fact, it could be
replaced by a tag with a specific known category. The difference is
that docspell will try to find a match and associate it with your
item. The equipment represents non-personal things that an item is
about. Examples are: bills or insurances for *cars*, contracts for
*houses* or *flats*.
Equipments don't have contact information, so the only property that
is used to find matches during document processing is its name.
### Folders
Folders provide a way to divide all documents into disjoint subsets.
Unlike with tags, an item can have at most one folder or none. A
folder has an owner the user who created the folder. Additionally,
it can have members: users of the collective that the owner can assign
to a folder.
When searching for items, the results are restricted to items that
have either no folder assigned or a folder where the current user is
owner or member. It can be used to control visibility when searching.
However: there are no hard access checks. For example, if the item id
is known, any user of the collective can see it and modify its meta
data.
One use case is, that you can hide items from other users, like bills
for birthday presents. In this case it is very unlikely that someone
can guess the item-id.
While folders are *not* taken into account when processing documents,
they can be specified with the upload request or a [source
url](uploading#anonymous-upload) to have them automatically set when
they arrive.
## Document Language
An important setting is the language of your documents. This helps OCR
and text analysis. You can select between English and German
currently.
Go to the *Collective Settings* page and click *Document
Language*. This will set the lanugage for all your documents. It is
not (yet) possible to specify it when uploading.

Binary file not shown.

After

Width:  |  Height:  |  Size: 233 KiB

View File

@ -0,0 +1,74 @@
+++
title = "Notify about due items"
weight = 60
[extra]
mktoc = true
+++
A user that provides valid email (smtp) settings, can be notified by
docspell about due items. You will then receive an e-mail containing a
list of items, sorted by their due date.
You need first define smtp settings, please see [this
page](@/docs/webapp/emailsettings.md#smtp-settings).
Notifying works simply by searching for due items periodically. It
will be submitted to the job queue and is picked up by an available
[job executor](joex) eventually. This can be setup in the user
settings page.
{{ figure(file="notify-due-items.jpg") }}
At first, the task can be disabled/enabled any time.
Then two settings are required for sending an e-mail. You need to
specify the connection to use and the recipients.
It follows some settings to customize the query for searching items.
You can choose to only include items that have one or more tags (these
are `and`-ed, so all tags must exist on the item). You can also
provide tags that must *not* appear on an item (these tags are
`or`-ed, so only one such tag is enough ot exclude an item). A common
use-case would be to manually tag an item with *Done* once there is
nothing more to do. Then these items can be excluded from the search.
The somewhat inverse use-case is to always tag items with a *Todo* tag
and remove it once completed.
The *Remind Days* field species the number of days the due date may be
in the future. Each time the task executes, it searches for items with
a due date lower than `today + remindDays`.
If you don't restrict the search using tags, then all items with a due
date lower than this value are selected. Since items are (usually) not
deleted, this only makes sense, if you remove the due date once you
are done with an item.
The last option is to check *cap overdue items*, which uses the value
in *Remind Days* to further restrict the due date of an item: only
those with a due date *greater than* `today - remindDays` are
selected. In other words, only items with an overdue time of *at most*
*Remind Days* are included.
The *Schedule* field specifies the periodicity. The syntax is similiar
to a date-time string, like `2019-09-15 12:32`, where each part is a
pattern to also match multple values. The ui tries to help a little by
displaying the next two date-times this task would execute. A more in
depth help is available
[here](https://github.com/eikek/calev#what-are-calendar-events). For
example, to execute the task every monday at noon, you would write:
`Mon *-*-* 12:00`. A date-time part can match all values (`*`), a list
of values (e.g. `1,5,12,19`) or a range (e.g. `1..9`). Long lists may
be written in a shorter way using a repetition value. It is written
like this: `1/7` which is the same as a list with `1` and all
multiples of `7` added to it. In other words, it matches `1`, `1+7`,
`1+7+7`, `1+7+7+7` and so on.
You can click on *Start Once* to run this task right now, without
saving the form to the database ("right now" means it is picked up by
a free job executor).
If you click *Submit* these settings are saved and the task runs
periodically.
You can see the task executing at the [processing
page](@/docs/webapp/processing.md).

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

View File

@ -0,0 +1,39 @@
+++
title = "Processing Queue"
weight = 80
[extra]
mktoc = true
+++
The page *Processing Queue* shows the current state of document
processing for your uploads.
At the top of the page a list of running jobs is shown. Below that,
the left column shows jobs that wait to be picked up by the job
executor. On the right are finished jobs. The number of finished jobs
is cut to some maximum and is also restricted by a date range. The
page refreshes itself automatically to show the progress.
Example screenshot:
{{ figure(file="processing-queue.jpg") }}
You can cancel running jobs or remove waiting ones from the queue. If
you click on the small file symbol on finished jobs, you can inspect
its log messages again. A running job displays the job executor id
that executes the job.
The jobs listed here are all long-running tasks for your collective.
Most of the time it executes the document processing tasks. But user
defined tasks, like "import mailbox", are also visible here.
Since job executors are shared among all collectives, it may happen
that a job is some time waiting until it is picked up by a job
executor. You can always start more job executors to help out.
If a job fails, it is retried after some time. Only if it fails too
often (can be configured), it then is finished with *failed* state.
For the document-processing task, if processing finally fails or a job
is cancelled, the item is still created, just without suggestions.

Binary file not shown.

After

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

View File

@ -0,0 +1,122 @@
+++
title = "Scan Mailboxes"
weight = 70
[extra]
mktoc = true
+++
User that provide valid email (imap) settings, can import mails from
their mailbox into docspell periodically.
You need first define imap settings, please see [this
page](@/docs/webapp/emailsettings.md#imap-settings).
Go to *User Settings -> Scan Mailbox Task*. You can define periodic
tasks that connects to your mailbox and import mails into docspell. It
is possible to define multiple tasks, for example, if you have
multiple e-mail accounts you want to import periodically.
{{ figure(file="scanmailbox-list.png") }}
## Details
Creating a task requires the following information:
{{ figure(file="scanmailbox-detail.png") }}
You can enable or disable this task. A disabled task will not run
periodically. You can still choose to run it manually if you click the
`Start Once` button.
Then you need to specify which [IMAP
connection](@/docs/webapp/emailsettings.md#imap-settings) to use.
A list of folders is required. Docspell will only look into these
folders. You can specify multiple folders. The "Inbox" folder is a
special folder, which will usually appear translated in your web-mail
client. You can specify "INBOX" case insensitive, it will then read
mails in your inbox. Any other folder is usually case-sensitive
(depends on the imap server, but usually they are case sensitive
except the INBOX folder). Type in a folder name and click the add
button on the right.
Then the field *Received Since Hours* defines how many hours to go
back and look for mails. Usually there are many mails in your inbox
and importing them all at once is not feasible or desirable. It can
work together with the *Schedule* field below. For example, you could
run this task all 6 hours and read mails from 8 hours back.
The next two settings tell docspell what to do once a mail has been
submitted to docspell. It can be moved into another folder in your
mail account. This moves it out of the way for the next run. You can
also choose to delete the mail, but *note that it will really be
deleted and not moved to your trash folder*. If both options are off,
nothing happens with that mail, it simply stays (and could be re-read
on the next run).
When docspell creates an item from a mail, it needs to set a direction
value (incoming or outgoing). If you know that all mails you want to
import have a specific directon, then you can set it here. Otherwise,
*automatic* means that docspell chooses a direction based on the
`From` header of a mail. If the `From` header is an e-mail address
that belongs to a “concerning” person in your address book, then it is
set to "outgoing". Otherwise it is set to "incoming". To support this,
you need to add your own e-mail address(es) to your address book.
The *Item Folder* setting is used to put all items that are created
from mails into the specified [folder](metadata#folders). If you
define a folder here, where you are not a member, you won't find
resulting items.
The last field is the *Schedule* which defines when and how often this
task should run. The syntax is similiar to a date-time string, like
`2019-09-15 12:32`, where each part is a pattern to also match multple
values. The ui tries to help a little by displaying the next two
date-times this task would execute. A more in depth help is available
[here](https://github.com/eikek/calev#what-are-calendar-events). For
example, to execute the task every monday at noon, you would write:
`Mon *-*-* 12:00`. A date-time part can match all values (`*`), a list
of values (e.g. `1,5,12,19`) or a range (e.g. `1..9`). Long lists may
be written in a shorter way using a repetition value. It is written
like this: `1/7` which is the same as a list with `1` and all
multiples of `7` added to it. In other words, it matches `1`, `1+7`,
`1+7+7`, `1+7+7+7` and so on.
## Reading Mails twice / Duplicates
Since users can move around mails in their mailboxes, it can happen
that docspell unintentionally reads a mail multiple times. If docspell
reads a mail, it will first check if an item already exists that
originated from this mail. It only proceeds to import it, if it cannot
find any. If you deleted an item in the meantime, docspell would
import the mail again.
This check uses the
[`Message-ID`](https://en.wikipedia.org/wiki/Message-ID) of an e-mail.
This is usually there and should identify a complete mail. But it
won't catch duplicate mails, that are sent multiple times - they might
have different `Message-ID`s. Also some mails have no such ids and are
then imported from docspell without any checks.
In later versions, docspell may use the checksum of the generated eml
file to look for duplicates, too.
## How it works
Docspell will go through all folders and download mails in “batches”.
This size can be set by the admin in the [configuration
file](@/docs/configure/_index.md#joex) and applies to all these tasks
(same for all users). This batch only contains the mail headers and
not the complete mail.
Then each mail is downloaded completely one by one and converted into
an [eml](https://en.wikipedia.org/wiki/Email#Filename_extensions) file
which is then submitted to docspell. Then the usual processing
machinery starts, just like uploading an eml file via the webapp.
The number of folders and the number of mails to import can be limited
by an admin via the config file. Note that this limit applies to one
task run only, it is meant to reduce resource allocation of one task.

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 KiB

View File

@ -0,0 +1,177 @@
+++
title = "Uploads"
weight = 0
+++
This page describes, how files can get into docspell. Technically,
there is just one way: via http multipart/form-data requests.
## Authenticated Upload
From within the web application there is the "Upload Files"
page. There you can select multiple files to upload. You can also
specify whether these files should become one item or if every file is
a separate item.
When you click "Submit" the files are uploaded and stored in the
database. Then the job executor(s) are notified which immediately
start processing them.
Go to the top-right menu and click "Processing Queue" to see the
current state.
This obviously requires an authenticated user. While this is handy for
ad-hoc uploads, it is very inconvenient for automating it by custom
scripts. For this the next variant exists.
## Anonymous Upload
It is also possible to upload files without authentication. This
should make tools that interact with docspell much easier to write.
### Creating Anonymous Uploads
Go to "Collective Settings" and then to the "Source" tab. A *Source*
identifies an endpoint where files can be uploaded
anonymously. Creating a new source creates a long unique id which is
part on an url that can be used to upload files. You can choose any
time to deactivate or delete the source at which point uploading is
not possible anymore. The idea is to give this URL away safely. You
can delete it any time and no passwords or secrets are visible, even
your username is not visible.
Example screenshot:
{{ figure(file="sources-form.png") }}
This example shows a source with name "test". Besides a description
and a name that is only used for displaying purposes, a priority and a
[folder](@/docs/webapp/metadata.md#folders) can be specified.
The priority is used for the processing jobs that are submitted when
files are uploaded via this endpoint.
The folder is used to place all items, that result from uploads to
this endpoint, into this folder.
The source endpoint defines two urls:
- `/app/upload/<id>`
- `/api/v1/open/upload/item/<id>`
The first points to a web page where everyone could upload files into
your account. You could give this url to people for sending files
directly into your docspell.
The second url is the API url, which accepts the requests to upload
files (which is used by the first url).
For example, this url can be used to upload files with curl:
``` bash
$ curl -XPOST -F file=@test.pdf http://localhost:7880/api/v1/open/upload/item/CqpFTb7UmGe-9nMVPZSmnwc-AHH6nWFh52t-M1JFQ9y7cdH
{"success":true,"message":"Files submitted."}
```
You could add more `-F file=@/path/to/your/file.pdf` to upload
multiple files (note, the `@` is required by curl, so it knows that
the following is a file).
When files are uploaded to an source endpoint, the items resulting
from this uploads are marked with the name of the source. So you know
which source an item originated.
If files are uploaded using the web applications *Upload files* page,
the source is implicitly set to `webapp`. If you also want to let
docspell count the files uploaded through the web interface, just
create a source (can be inactive) with that name (`webapp`).
## Integration Endpoint
Another option for uploading files is the special *integration
endpoint*. This endpoint allows an admin to upload files to any
collective, that is known by name.
```
/api/v1/open/integration/item/[collective-name]
```
The endpoint is behind `/api/v1/open`, so this route is not protected
by an authentication token (see [REST Api](@/docs/api/_index.md) for
more information). However, it can be protected via settings in the
configuration file. The idea is that this endpoint is controlled by an
administrator and not the user of the application. The admin can
enable this endpoint and choose between some methods to protect it.
Then the administrator can upload files to any collective. This might
be useful to connect other trusted applications to docspell (that run
on the same host or network).
The endpoint is disabled by default, an admin must change the
`docspell.server.integration-endpoint.enabled` flag to `true` in the
[configuration file](@/docs/configure/_index.md#rest-server).
If queried by a `GET` request, it returns whether it is enabled and
the collective exists.
It is also possible to check for existing files using their sha256
checksum with:
```
/api/v1/open/integration/checkfile/[collective-name]/[sha256-checksum]
```
See the [SMTP gateway](@/docs/tools/smtpgateway.md) or the [consumedir
script](@/docs/tools/consumedir.md) for examples to use this endpoint.
## The Request
This gives more details about the request for uploads. It is a http
`multipart/form-data` request, with two possible fields:
- meta
- file
The `file` field can appear multiple times and is required at least
once. It is the part containing the file to upload.
The `meta` part is completely optional and can define additional meta
data, that docspell uses to create items from the given files. It
allows to transfer structured information together with the
unstructured binary files.
The `meta` content must be `application/json` containing this
structure:
``` elm
{ multiple: Bool
, direction: Maybe String
, folder: Maybe String
}
```
The `multiple` property is by default `true`. It means that each file
in the upload request corresponds to a single item. An upload with 5
files will result in 5 items created. If it is `false`, then docspell
will create just one item, that will then contain all files.
Furthermore, the direction of the document (one of `incoming` or
`outgoing`) can be given. It is optional, it can be left out or
`null`.
A `folder` id can be specified. Each item created by this request will
be placed into this folder. Errors are logged (for example, the folder
may have been deleted before the task is executed) and the item is
then not put into any folder.
This kind of request is very common and most programming languages
have support for this. For example, here is another curl command
uploading two files with meta data:
``` bash
curl -XPOST -F meta='{"multiple":false, "direction": "outgoing"}' \
-F file=@letter-en-source.pdf \
-F file=@letter-de-source.pdf \
http://localhost:7880/api/v1/open/upload/item/CqpFTb7UmGe-9nMVPZSmnwc-AHH6nWFh52t-M1JFQ9y7cdH
```