2019-07-22 22:53:30 +00:00
|
|
|
---
|
|
|
|
layout: docs
|
|
|
|
title: Adding Meta Data
|
2020-03-28 15:35:28 +00:00
|
|
|
permalink: doc/metadata
|
2019-07-22 22:53:30 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
# {{ page.title }}
|
|
|
|
|
|
|
|
## Meta Data
|
|
|
|
|
2020-03-28 15:35:28 +00:00
|
|
|
Docspell processes each uploaded file. Processing involves extracting
|
|
|
|
archives, extracting text, anlyzing the extracted text and converting
|
|
|
|
the file into a pdf. Text is analyzed to find metadata that can be set
|
|
|
|
automatically. Docspell compares the extracted text against a set of
|
|
|
|
known meta data. The *Meta Data* page allows to manage this meta data.
|
|
|
|
You can create the following:
|
2019-07-22 22:53:30 +00:00
|
|
|
|
|
|
|
- Tags
|
|
|
|
- Organizations
|
|
|
|
- Persons
|
|
|
|
- Equipments
|
|
|
|
|
2020-03-28 15:35:28 +00:00
|
|
|
|
2019-07-22 22:53:30 +00:00
|
|
|
### Tags
|
|
|
|
|
|
|
|
Items can be tagged with multiple custom tags (aka labels). This
|
|
|
|
allows to describe many different workflows people may have with their
|
|
|
|
documents.
|
|
|
|
|
|
|
|
A tag can have a *category*. This is meant to group tags together. For
|
|
|
|
example, you may want to have a tag category *doctype* that is
|
|
|
|
comprised of tags like *bill*, *contract*, *receipt* and so on. Or for
|
|
|
|
workflows, a tag category *state* may exist that includes tags like
|
|
|
|
*Todo* or *Waiting*. Or you can tag items with user names to provide
|
|
|
|
"assignment" semantics. Docspell doesn't propose any workflow, but it
|
|
|
|
can help to implement some.
|
|
|
|
|
|
|
|
The tags are *not* taken into account when processing. Docspell will
|
|
|
|
not automatically associate tags to your items. The tags are only
|
2020-03-28 15:35:28 +00:00
|
|
|
meant to be used manually for now.
|
2019-07-22 22:53:30 +00:00
|
|
|
|
|
|
|
|
|
|
|
### Organization and Person
|
|
|
|
|
|
|
|
The organization entity represents an non-personal (organization or
|
|
|
|
company) correspondent of an item. Docspell will choose one or more
|
|
|
|
organizations when processing documents and associate the "best" match
|
|
|
|
with your item.
|
|
|
|
|
|
|
|
The person entitiy can appear in two roles: It may be a correspondent
|
|
|
|
or the person an item is about. So a person is either a correspondent
|
|
|
|
or a concerning person. Docspell can not know which person is which,
|
|
|
|
therefore you need to tell this by checking the box "Use for
|
|
|
|
concerning person suggestion only". If this is checked, docspell will
|
|
|
|
use this person only to suggest a concerning person. Otherwise the
|
|
|
|
person is used only for correspondent suggestions.
|
|
|
|
|
|
|
|
Document processing uses the following properties:
|
|
|
|
|
|
|
|
- name
|
|
|
|
- websites
|
|
|
|
- e-mails
|
|
|
|
|
2020-01-11 21:12:51 +00:00
|
|
|
The website and e-mails can be added as contact information. If these
|
2019-07-22 22:53:30 +00:00
|
|
|
three are present, you should get good matches from docspell. All
|
|
|
|
other fields of an organization and person are not used during
|
|
|
|
document processing. They might be useful when using this as a real
|
|
|
|
address book.
|
|
|
|
|
|
|
|
|
|
|
|
### Equipment
|
|
|
|
|
|
|
|
The equipment entity is almost like a tag. In fact, it could be
|
|
|
|
replaced by a tag with a specific known category. The difference is
|
|
|
|
that docspell will try to find a match and associate it with your
|
|
|
|
item. The equipment represents non-personal things that an item is
|
|
|
|
about. Examples are: bills or insurances for *cars*, contracts for
|
|
|
|
*houses* or *flats*.
|
|
|
|
|
|
|
|
Equipments don't have contact information, so the only property that
|
|
|
|
is used to find matches during document processing is its name.
|
|
|
|
|
|
|
|
|
|
|
|
## Document Language
|
|
|
|
|
|
|
|
An important setting is the language of your documents. This helps OCR
|
|
|
|
and text analysis. You can select between English and German
|
|
|
|
currently.
|
|
|
|
|
|
|
|
Go to the *Collective Settings* page and click *Document
|
|
|
|
Language*. This will set the lanugage for all your documents. It is
|
|
|
|
not (yet) possible to specify it when uploading.
|