mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-22 10:28:27 +00:00
Update documentation
This commit is contained in:
@ -286,16 +286,13 @@ docspell.joex {
|
||||
# 4. disabled - doesn't use any stanford-nlp feature
|
||||
#
|
||||
# The full and basic variants rely on pre-build language models
|
||||
# that are available for only 3 lanugages at the moment: German,
|
||||
# English and French.
|
||||
#
|
||||
# Memory usage varies greatly among the languages. German has
|
||||
# quite large models, that require about 1G heap. So joex should
|
||||
# run with -Xmx1500M at least when using mode=full.
|
||||
# that are available for only a few languages. Memory usage
|
||||
# varies among the languages. So joex should run with -Xmx1400M
|
||||
# at least when using mode=full.
|
||||
#
|
||||
# The basic variant does a quite good job for German and
|
||||
# English. It might be worse for French, always depending on the
|
||||
# type of text that is analysed. Joex should run with about 600M
|
||||
# type of text that is analysed. Joex should run with about 500M
|
||||
# heap, here again lanugage German uses the most.
|
||||
#
|
||||
# The regexonly variant doesn't depend on a language. It roughly
|
||||
@ -349,25 +346,23 @@ docspell.joex {
|
||||
|
||||
# Settings for doing document classification.
|
||||
#
|
||||
# This works by learning from existing documents. A collective can
|
||||
# specify a tag category and the system will try to predict a tag
|
||||
# from this category for new incoming documents.
|
||||
#
|
||||
# This requires a satstical model that is computed from all
|
||||
# existing documents. This process is run periodically as
|
||||
# configured by the collective. It may require a lot of memory,
|
||||
# depending on the amount of data.
|
||||
# This works by learning from existing documents. This requires a
|
||||
# satstical model that is computed from all existing documents.
|
||||
# This process is run periodically as configured by the
|
||||
# collective. It may require more memory, depending on the amount
|
||||
# of data.
|
||||
#
|
||||
# It utilises this NLP library: https://nlp.stanford.edu/.
|
||||
classification {
|
||||
# Whether to enable classification globally. Each collective can
|
||||
# decide to disable it. If it is disabled here, no collective
|
||||
# can use classification.
|
||||
# enable/disable auto-tagging. The classifier is also used for
|
||||
# finding correspondents and concerned entities, if enabled
|
||||
# here.
|
||||
enabled = true
|
||||
|
||||
# If concerned with memory consumption, this restricts the
|
||||
# number of items to consider. More are better for training. A
|
||||
# negative value or zero means no train on all items.
|
||||
# negative value or zero means to train on all items.
|
||||
item-count = 0
|
||||
|
||||
# These settings are used to configure the classifier. If
|
||||
|
Reference in New Issue
Block a user