Some other filetypes, like office documents, are also zip file. To
distinguish these without unpacking them, the file extensions is
checked.
Fixes: #1365
It's more convenient to manage notification channels separately, as it
is done with email settings. Notification hook and other forms are
adopted to only select channels. Hooks can now use more than one
channel.
This is a start to have different kinds of notifications. It is
possible to be notified via e-mail, matrix or gotify. It also extends
the current "periodic query" for due items by allowing notification
over different channels. A "generic periodic query" variant is added
as well.
There are models for Spanish, that have been added now. Also the
Hungarian language has been added to the list of supported
languages (for tesseract mainly, no nlp models)
It was wrongly stored using RPeriodicTask directly, but the higher
level `UserTask` must be used instead, because this ensures a
correctly scoped periodic task using the `updateOneTask` method. Since
this is a system task, it can be given a fixed ID which makes it now
safe even if stored using RPeriodicTask directly.
The bug resulted in multiple empty-trash tasks to be inserted (on each
restart).
Refs: #347
First, when checking for existence of a file, deleted items are not
conisdered.
The working with fulltext search has been changed: deleted items are
removed from fulltext index and are re-added when they are restored.
The fulltext index currently doesn't hold the item state and it would
mean much more work to introduce it into the index (or, worse, to
reprocess the results from the index). Thus, deleted items can only be
searched using database queries. It is probably a very rare use case
when fulltext search should be applied to deleted items. They have
been deleted for a reason and the most likely case is that they are
simply removed.
Refs: #347
In order to keep deleted items for a while, the periodic task can now
use a duration to only remove items with a certain age. This can be
used to ensure that a deleted item stays at least X days before it
will be removed from the database.
Refs: #347
Before, there were periodic tasks run per collective and not user by
making sure that submitter + group are the same value. This is now
encoded in `UserTaskScope` so it is now obvious and errors can be
reduced when using this.
When processing a new file conversion and text extraction is skipped
if detected to be already done. This prevents running expensive tasks
again after restarting/retrying. When explicitely reprocessing a file,
these tasks should run again and replace the existing results.
When reprocessing an item, the metadat of all *files* are replaced.
This change now also sets some metadat to an item, but only if the
item is not in state "confirmed". Confirmed items are not touched, but
the metadata of the files is updated.
This drops fomantic-ui as css toolkit and introduces tailwindcss. With
tailwind there are no predefined components, but it's very easy to
create those. So customizing the look&feel is much simpler, most of
the time no additional css is needed.
This requires a complete rewrite of the markup + styles. Luckily all
logic can be kept as is. The now old ui is not removed, it is still
available by using a request header `Docspell-Ui` with a value of `1`
for the old ui and `2` for the new ui.
Another addition is "dev mode", where docspell serves assets with a
no-cache header, to disable browser caching. This makes developing a
lot easier.
Mails are filtered once by using an imap search and then by some globs
to filter files and subjects. Imap can search by subject via a
string-contains, but not via globs or patterns (afaik). The subject
filter is applied to all downloaded mail headers. Now for post
processing (moving to some target folder or deleting), it can be
chosen to post-process all "seen" mails or only those that matched all
filters.
Classifier don't work on each attachment, but on all. So the results
must not be stored at an attachment. This reverts some previous
changes to put the classifier results for item entities into its own
table.
When re-indexing everything, skip intermediate populating the index
and do this as the very last step.
Parameterize adding new fields by their language.
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.