docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-08-25 13:33:17 +00:00

Author	SHA1	Message	Date
Eike Kettner	e1bbc2edf5	Apply autoformat	2021-04-10 16:31:58 +02:00
Scala Steward	144ea852bf	Update fs2-core, fs2-io to 2.5.4	2021-03-31 21:10:42 +02:00
Eike Kettner	6a63694a3e	Convert unit tests to munit	2021-03-10 19:48:56 +01:00
Eike Kettner	9991ad5fcc	Add latvian language	2021-03-09 00:23:17 +01:00
Eike Kettner	e6d9ce2c37	Remove obsolete type capabilities These are now detected by the new scala compiler and lead to compile errors.	2021-03-01 00:16:30 +01:00
Eike Kettner	c7d4c77e6d	Allow more suggestions for date variants in English	2021-02-26 00:35:17 +01:00
Eike Kettner	c7e850116f	Make the text length limit optional	2021-01-22 23:06:50 +01:00
Eike Kettner	249f9e6e2a	Extend guessing tags to all tag categories	2021-01-18 21:51:45 +01:00
Eike Kettner	3f75af0807	Add 9 more lanugages to the list of document lanugages	2021-01-18 17:41:40 +01:00
Eike Kettner	26dff18ae0	Add spanish as an example Adding a new language without nlp requires now only to fill out the pieces: - define a list of month names to support date recognition - add it to joex' dockerfile to be available for tesseract - update the solr migration/field definitions - update the elm file so it shows up on the client	2021-01-18 17:41:40 +01:00
Eike Kettner	ff121d462c	Disable memory intensive tests on travis	2021-01-18 17:41:40 +01:00
Eike Kettner	f01646aeb5	Reorganize nlp pipeline and add nlp-unsupported language italian Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.	2021-01-18 17:41:40 +01:00
Eike Kettner	aa937797be	Choose nlp mode in config file	2021-01-17 22:56:33 +01:00
Eike Kettner	54a09861c4	Use model cache with basic annotator	2021-01-17 22:56:33 +01:00
Eike Kettner	a77f67d73a	Make pipeline cache generic to be used with BasicCRFAnnotator	2021-01-17 22:56:33 +01:00
Eike Kettner	4462ebae0f	Resurrect the basic ner classifier	2021-01-17 22:56:33 +01:00
Eike Kettner	a699e87304	Separate ner from classification	2021-01-17 22:56:33 +01:00
Eike Kettner	f02f15e5bd	Move blocker into constructor of text analyser	2021-01-17 22:56:33 +01:00
Eike Kettner	b2b8ad625a	scalafmt	2021-01-17 20:11:58 +01:00
Eike Kettner	75986c461f	Fix ner date label boundary reporting	2021-01-10 09:10:39 +01:00
Eike Kettner	fb05e997ab	Provide multiple date suggestions for English Issue: #561	2021-01-10 09:02:26 +01:00
Eike Kettner	716252721c	Fix cache clearing It must be cancelled when obtaining a pipeline.	2021-01-07 23:31:01 +01:00
Eike Kettner	a670bbb6c2	Make idle interval when clearing nlp cache configurable	2021-01-06 23:03:00 +01:00
Eike Kettner	73a9572835	Poc for clearing stanford pipeline after some idle time	2021-01-05 23:56:20 +01:00
Tammo van Lessen	e9347176bd	Fixes an off-by-one classic to also accept dates in January	2020-11-28 00:43:35 +01:00
Eike Kettner	cf6e63785d	Fix potential index-out-of-bounds error in classifier The stanford library expects a non-empty text.	2020-11-09 00:04:51 +01:00
Eike Kettner	3f697f51aa	Autoformat	2020-10-06 23:31:09 +02:00
Eike Kettner	53c8d3031d	Skip invalid dates find in texts Fixes: #298	2020-10-02 22:37:15 +02:00
Eike Kettner	c658677032	Autoformat	2020-09-09 00:29:32 +02:00
Eike Kettner	97757876d5	Fix formatting	2020-09-08 00:47:42 +02:00
Eike Kettner	c9bd57592b	Don't use test data if there is just one config If classifier models cannot be compared, there is no reason to test.	2020-09-07 20:02:50 +02:00
Eike Kettner	316b490008	Implement learning a text classifier from collective data	2020-09-02 18:28:14 +02:00
Eike Kettner	0c97b4ef76	Initial impl of a text classifier based on stanford-nlp	2020-09-02 18:28:14 +02:00
Eike Kettner	96d2f948f2	Use collective's addressbook to configure regexner	2020-08-24 14:40:52 +02:00
Eike Kettner	8628a0a8b3	Allow configuring stanford-ner and cache based on collective	2020-08-24 10:55:59 +02:00
Eike Kettner	fdb46da26d	Add french language and upgrade stanford-nlp to 4.0.0	2020-08-23 17:48:42 +02:00
Eike Kettner	347a029af8	Scalafix organize-imports	2020-06-28 21:20:47 +02:00
Eike Kettner	897d91475e	Update scalafmt-core to 2.6.0	2020-06-17 19:53:56 +02:00
Eike Kettner	075b665c68	Add some more tlds to look for	2020-05-24 11:48:49 +02:00
Eike Kettner	5e6ce1737c	Change recognizing dates with short years Short years are now added to the current centure (2000) such that date strings like 12/26/11 result in 12/26/2011 and not 12/26/1911.	2020-05-17 11:58:51 +02:00
Eike Kettner	c41cdeefec	Update scalafmt to 2.5.1 + scalafmtAll	2020-05-04 23:53:57 +02:00
Eike Kettner	6a1297fc95	Add a limit for text analysis	2020-03-27 22:54:49 +01:00
Eike Kettner	9656ba62f4	scalafmtAll	2020-03-26 18:26:00 +01:00
Eike Kettner	2f87065b2e	sbt scalafmtAll	2020-02-25 20:55:00 +01:00
Eike Kettner	8143a4edcc	Adding extraction primitives	2020-02-16 21:37:26 +01:00
Eike Kettner	851ee7ef0f	Reorganize processing code Use separate modules for - text extraction - conversion to pdf - text analysis	2020-02-15 21:25:25 +01:00

46 Commits