docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-10-28 18:50:11 +00:00

Author	SHA1	Message	Date
eikek	1c0d87527b	Log error when setting folder doesn't work	2021-07-17 15:10:00 +02:00
eikek	8e5c88fd32	Add copyright header to source files	2021-07-04 10:57:53 +02:00
eikek	bd791b4593	Upgrade code base to CE3	2021-06-22 22:53:34 +02:00
eikek	ac7d00c28f	Refactor re-index task	2021-06-07 21:17:29 +02:00
eikek	3ee0846e19	Remove fts_migration table It is now stored it SOLR instead.	2021-06-07 17:53:47 +02:00
eikek	5205ee0623	Store solr migration state in a solr document	2021-06-07 17:53:37 +02:00
eikek	bdc7822f50	Add documentation about docker setup	2021-05-31 22:19:49 +02:00
Eike Kettner	e1bbc2edf5	Apply autoformat	2021-04-10 16:31:58 +02:00
Scala Steward	144ea852bf	Update fs2-core, fs2-io to 2.5.4	2021-03-31 21:10:42 +02:00
Eike Kettner	c36073b852	Allow to give human readable summary to user tasks	2021-03-27 22:13:13 +01:00
Eike Kettner	cc38b850a6	Remove deprecated search routes and some refactoring	2021-03-27 22:13:13 +01:00
Eike Kettner	f8bd42e5bd	Redo pdf conversion and text extraction on reprocess When processing a new file conversion and text extraction is skipped if detected to be already done. This prevents running expensive tasks again after restarting/retrying. When explicitely reprocessing a file, these tasks should run again and replace the existing results.	2021-03-12 00:45:28 +01:00
Eike Kettner	a7ee0aa08b	Add a flag to processing task to distinguish re-/processing	2021-03-12 00:45:23 +01:00
Eike Kettner	058c31e1f6	Reprocessing now sets metadata to an item if not in state confirmed When reprocessing an item, the metadat of all files are replaced. This change now also sets some metadat to an item, but only if the item is not in state "confirmed". Confirmed items are not touched, but the metadata of the files is updated.	2021-03-12 00:16:19 +01:00
Eike Kettner	0229a867af	Add a use colum to metadata entities	2021-03-10 23:55:18 +01:00
Eike Kettner	6a63694a3e	Convert unit tests to munit	2021-03-10 19:48:56 +01:00
Eike Kettner	9013d9264e	Add more convenient date parsers and some basic macros	2021-03-01 00:51:01 +01:00
Eike Kettner	e9ed998e3a	Basic poc to search via custom query	2021-03-01 00:51:01 +01:00
Eike Kettner	186014a1c6	Refactor search to separate between a base query and user query The `findBase` is adding only strictly required conditions. Everything else comes from the user.	2021-03-01 00:51:01 +01:00
Eike Kettner	e6d9ce2c37	Remove obsolete type capabilities These are now detected by the new scala compiler and lead to compile errors.	2021-03-01 00:16:30 +01:00
Eike Kettner	d7bc963450	Cleanup nodes that are not reachable anymore	2021-02-18 00:37:18 +01:00
Eike Kettner	48eee00c0b	Allow person to be correspondent, concerning or both	2021-02-16 22:49:55 +01:00
Eike Kettner	d99ce76d89	Remove person suggestion if it doesn't match with organization	2021-02-16 00:29:54 +01:00
Eike Kettner	dd935454c9	First version of new ui based on tailwind This drops fomantic-ui as css toolkit and introduces tailwindcss. With tailwind there are no predefined components, but it's very easy to create those. So customizing the look&feel is much simpler, most of the time no additional css is needed. This requires a complete rewrite of the markup + styles. Luckily all logic can be kept as is. The now old ui is not removed, it is still available by using a request header `Docspell-Ui` with a value of `1` for the old ui and `2` for the new ui. Another addition is "dev mode", where docspell serves assets with a no-cache header, to disable browser caching. This makes developing a lot easier.	2021-02-14 01:46:13 +01:00
Eike Kettner	96612e0e59	Refactor scan mailbox form and add flag for post-processing Mails are filtered once by using an imap search and then by some globs to filter files and subjects. Imap can search by subject via a string-contains, but not via globs or patterns (afaik). The subject filter is applied to all downloaded mail headers. Now for post processing (moving to some target folder or deleting), it can be chosen to post-process all "seen" mails or only those that matched all filters.	2021-01-24 01:46:31 +01:00
Eike Kettner	c7e850116f	Make the text length limit optional	2021-01-22 23:06:50 +01:00
Eike Kettner	4cba96f390	Always return classifier results as suggestion The classifier results are spliced into the suggestion list at second place. When linking they are only used if nlp didn't find anything.	2021-01-21 21:05:28 +01:00
Eike Kettner	9957c3267e	Add constraints from config to classifier training For large and/or many documents, training the classifier can lead to OOM errors. Some limits have been set by default.	2021-01-21 17:46:39 +01:00
Eike Kettner	a6c31be22f	Update documentation	2021-01-20 22:47:15 +01:00
Eike Kettner	85ddc61d9d	Move date proposal setting to nlp config	2021-01-20 19:17:29 +01:00
Eike Kettner	b12d965223	Improve logging	2021-01-20 00:40:58 +01:00
Eike Kettner	27c24c128d	Store tags guessed with classifier in database	2021-01-20 00:30:40 +01:00
Eike Kettner	9d83cb7fe4	Store item based proposals in separate table Classifier don't work on each attachment, but on all. So the results must not be stored at an attachment. This reverts some previous changes to put the classifier results for item entities into its own table.	2021-01-19 23:48:09 +01:00
Eike Kettner	75573c905e	Use classifier results as fallback when linking proposed metadata	2021-01-19 23:13:34 +01:00
Eike Kettner	8455d1badf	Lookup results from classifier The model may be out of date, data may change. Then it should be looked up to fetch the id to be compatible with next stages.	2021-01-19 22:56:01 +01:00
Eike Kettner	1cd3441462	Run classifier for item entities (concerned, correspondent) Store the results separately from nlp results in attachment metadata.	2021-01-19 22:08:29 +01:00
Eike Kettner	5c487ef7a9	Refactor running classifier in text analysis	2021-01-19 21:30:02 +01:00
Eike Kettner	99dcaae66b	Learn classifiers for item entities Learns classifiers for concerned and correspondent entities. This can be used as an alternative to or after nlp.	2021-01-19 20:54:47 +01:00
Eike Kettner	a6f29153c4	Control what tag categories to use for auto-tagging	2021-01-19 01:20:13 +01:00
Eike Kettner	cce8878898	Exclude tags w/o category from classifying; remove obsolete models	2021-01-18 21:51:49 +01:00
Eike Kettner	249f9e6e2a	Extend guessing tags to all tag categories	2021-01-18 21:51:45 +01:00
Eike Kettner	360cad3304	Refactoring solr/fts migration When re-indexing everything, skip intermediate populating the index and do this as the very last step. Parameterize adding new fields by their language.	2021-01-18 17:41:40 +01:00
Eike Kettner	f01646aeb5	Reorganize nlp pipeline and add nlp-unsupported language italian Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.	2021-01-18 17:41:40 +01:00
Eike Kettner	a70e9ab614	Store used language for processing on attachmentmeta Issue: #570	2021-01-17 22:56:33 +01:00
Eike Kettner	aa937797be	Choose nlp mode in config file	2021-01-17 22:56:33 +01:00
Eike Kettner	a699e87304	Separate ner from classification	2021-01-17 22:56:33 +01:00
Eike Kettner	f02f15e5bd	Move blocker into constructor of text analyser	2021-01-17 22:56:33 +01:00
Eike Kettner	d77b5855e4	Set default pool-size to 1	2021-01-11 22:30:59 +01:00
Eike Kettner	bddafa7d28	Fix looping over already seen mails when they are skipped When skipping mails due to a filter, it must still enter the post-handling step. Otherwise it will be seen again on next run. Issue: #551	2021-01-09 15:07:18 +01:00
Eike Kettner	d712f8303d	Make glob matching case-insensitive by default	2021-01-09 13:23:15 +01:00

1 2 3 4 5

205 Commits