docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-06-22 10:28:27 +00:00

Author	SHA1	Message	Date
GooRoo	61d5585e68	Add Ukrainian language	2022-11-09 22:24:32 +01:00
eikek	c0feb13f63	Add Estonian language Closes: #1646	2022-11-01 01:00:16 +01:00
eikek	5ec311c331	Add polish to processing lanugages SOLR doesn't support polish out of the box. Plugins are required for polish. The language has been added only with basic support. For better results, a manual setup of solr is required. Closes: #1345	2022-05-21 14:41:16 +02:00
eikek	9d69401fea	Add Lithuanian to processing languages SOLR doesn't support Lithuanian, maybe it can be added via plugins. A manual setup of solr is required then. It has been added with basic support. Closes: #1540	2022-05-21 14:36:01 +02:00
eikek	7fdd78ad06	Experiment with addons Addons allow to execute external programs in some context inside docspell. Currently it is possible to run them after processing files. Addons are provided by URLs to zip files.	2022-05-15 23:46:43 +02:00
eikek	9eb9497675	Fix logging in tests	2022-02-19 23:33:01 +01:00
eikek	e483a97de7	Adopt to new loggin api	2022-02-19 21:41:38 +01:00
eikek	501c6f2988	Updating stanford corenlp to 4.3.2; adding more languages There are models for Spanish, that have been added now. Also the Hungarian language has been added to the list of supported languages (for tesseract mainly, no nlp models)	2021-11-20 14:31:39 +01:00
eikek	9013f2de5b	Update scalafmt settings	2021-09-22 17:23:24 +02:00
eikek	9785db0683	Change license header of all files	2021-09-21 22:35:38 +02:00
Scala Steward	e4fecefaea	Reformat with scalafmt 3.0.0	2021-08-19 08:50:30 +02:00
eikek	1901fe1a8c	Adopt deprecated APIs from fs2; use fs2.Path	2021-08-07 17:51:56 +02:00
eikek	4af8dd0950	Preprocess japanese texts to find dates Not very efficient, but should work to find the position of dates in japanese text.	2021-07-29 01:35:15 +02:00
wallace	e8348e2809	Remove excessive spaces	2021-07-29 02:08:48 +03:00
wallace11	1095a7d56f	Add another Japanese test	2021-07-29 01:13:22 +03:00
wallace11	119a4ffdc9	Update Japanese tests with more sensible data	2021-07-29 01:08:48 +03:00
eikek	f994d4b248	Add japanese document language	2021-07-28 20:05:48 +02:00
eikek	8e5c88fd32	Add copyright header to source files	2021-07-04 10:57:53 +02:00
eikek	bd791b4593	Upgrade code base to CE3	2021-06-22 22:53:34 +02:00
Eike Kettner	e1bbc2edf5	Apply autoformat	2021-04-10 16:31:58 +02:00
Eike Kettner	6a63694a3e	Convert unit tests to munit	2021-03-10 19:48:56 +01:00
Eike Kettner	9991ad5fcc	Add latvian language	2021-03-09 00:23:17 +01:00
Eike Kettner	c7d4c77e6d	Allow more suggestions for date variants in English	2021-02-26 00:35:17 +01:00
Eike Kettner	ff121d462c	Disable memory intensive tests on travis	2021-01-18 17:41:40 +01:00
Eike Kettner	f01646aeb5	Reorganize nlp pipeline and add nlp-unsupported language italian Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.	2021-01-18 17:41:40 +01:00
Eike Kettner	54a09861c4	Use model cache with basic annotator	2021-01-17 22:56:33 +01:00
Eike Kettner	4462ebae0f	Resurrect the basic ner classifier	2021-01-17 22:56:33 +01:00
Eike Kettner	a699e87304	Separate ner from classification	2021-01-17 22:56:33 +01:00
Eike Kettner	75986c461f	Fix ner date label boundary reporting	2021-01-10 09:10:39 +01:00
Eike Kettner	fb05e997ab	Provide multiple date suggestions for English Issue: #561	2021-01-10 09:02:26 +01:00
Eike Kettner	53c8d3031d	Skip invalid dates find in texts Fixes: #298	2020-10-02 22:37:15 +02:00
Eike Kettner	c658677032	Autoformat	2020-09-09 00:29:32 +02:00
Eike Kettner	0c97b4ef76	Initial impl of a text classifier based on stanford-nlp	2020-09-02 18:28:14 +02:00
Eike Kettner	96d2f948f2	Use collective's addressbook to configure regexner	2020-08-24 14:40:52 +02:00
Eike Kettner	fdb46da26d	Add french language and upgrade stanford-nlp to 4.0.0	2020-08-23 17:48:42 +02:00
Eike Kettner	9656ba62f4	scalafmtAll	2020-03-26 18:26:00 +01:00
Eike Kettner	8143a4edcc	Adding extraction primitives	2020-02-16 21:37:26 +01:00
Eike Kettner	851ee7ef0f	Reorganize processing code Use separate modules for - text extraction - conversion to pdf - text analysis	2020-02-15 21:25:25 +01:00

38 Commits