eikek 
							
						 
					 
					
						
						
							
						
						2c9e012c96 
					 
					
						
						
							
							Fix url parsing with trailing slash  
						
						... 
						
						
						
						Refs: #1545  
						
						
					 
					
						2022-07-07 15:22:26 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						5ec311c331 
					 
					
						
						
							
							Add polish to processing lanugages  
						
						... 
						
						
						
						SOLR doesn't support polish out of the box. Plugins are required for
polish. The language has been added only with basic support. For
better results, a manual setup of solr is required.
Closes : #1345  
						
						
					 
					
						2022-05-21 14:41:16 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						9d69401fea 
					 
					
						
						
							
							Add Lithuanian to processing languages  
						
						... 
						
						
						
						SOLR doesn't support Lithuanian, maybe it can be added via plugins. A
manual setup of solr is required then. It has been added with basic
support.
Closes : #1540  
						
						
					 
					
						2022-05-21 14:36:01 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						7fdd78ad06 
					 
					
						
						
							
							Experiment with addons  
						
						... 
						
						
						
						Addons allow to execute external programs in some context inside
docspell. Currently it is possible to run them after processing files.
Addons are provided by URLs to zip files. 
						
						
					 
					
						2022-05-15 23:46:43 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						9eb9497675 
					 
					
						
						
							
							Fix logging in tests  
						
						
						
						
					 
					
						2022-02-19 23:33:01 +01:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						e483a97de7 
					 
					
						
						
							
							Adopt to new loggin api  
						
						
						
						
					 
					
						2022-02-19 21:41:38 +01:00 
						 
				 
			
				
					
						
							
							
								Scala Steward 
							
						 
					 
					
						
						
							
						
						652e85ccea 
					 
					
						
						
							
							Reformat with scalafmt 3.3.1  
						
						
						
						
					 
					
						2022-01-02 00:50:55 +01:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						c21b2cdd29 
					 
					
						
						
							
							Update scalafmt to 3.0.8  
						
						
						
						
					 
					
						2021-12-11 22:46:55 +01:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						501c6f2988 
					 
					
						
						
							
							Updating stanford corenlp to 4.3.2; adding more languages  
						
						... 
						
						
						
						There are models for Spanish, that have been added now. Also the
Hungarian language has been added to the list of supported
languages (for tesseract mainly, no nlp models) 
						
						
					 
					
						2021-11-20 14:31:39 +01:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						9013f2de5b 
					 
					
						
						
							
							Update scalafmt settings  
						
						
						
						
					 
					
						2021-09-22 17:23:24 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						9785db0683 
					 
					
						
						
							
							Change license header of all files  
						
						
						
						
					 
					
						2021-09-21 22:35:38 +02:00 
						 
				 
			
				
					
						
							
							
								wallace 
							
						 
					 
					
						
						
							
						
						589c41003f 
					 
					
						
						
							
							Add hebrew document language  
						
						
						
						
					 
					
						2021-08-24 01:19:42 +03:00 
						 
				 
			
				
					
						
							
							
								Scala Steward 
							
						 
					 
					
						
						
							
						
						e4fecefaea 
					 
					
						
						
							
							Reformat with scalafmt 3.0.0  
						
						
						
						
					 
					
						2021-08-19 08:50:30 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						1901fe1a8c 
					 
					
						
						
							
							Adopt deprecated APIs from fs2; use fs2.Path  
						
						
						
						
					 
					
						2021-08-07 17:51:56 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						4af8dd0950 
					 
					
						
						
							
							Preprocess japanese texts to find dates  
						
						... 
						
						
						
						Not very efficient, but should work to find the position of dates in
japanese text. 
						
						
					 
					
						2021-07-29 01:35:15 +02:00 
						 
				 
			
				
					
						
							
							
								wallace 
							
						 
					 
					
						
						
							
						
						e8348e2809 
					 
					
						
						
							
							Remove excessive spaces  
						
						
						
						
					 
					
						2021-07-29 02:08:48 +03:00 
						 
				 
			
				
					
						
							
							
								wallace11 
							
						 
					 
					
						
						
							
						
						1095a7d56f 
					 
					
						
						
							
							Add another Japanese test  
						
						
						
						
					 
					
						2021-07-29 01:13:22 +03:00 
						 
				 
			
				
					
						
							
							
								wallace11 
							
						 
					 
					
						
						
							
						
						119a4ffdc9 
					 
					
						
						
							
							Update Japanese tests with more sensible data  
						
						
						
						
					 
					
						2021-07-29 01:08:48 +03:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						f994d4b248 
					 
					
						
						
							
							Add japanese document language  
						
						
						
						
					 
					
						2021-07-28 20:05:48 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						8e5c88fd32 
					 
					
						
						
							
							Add copyright header to source files  
						
						
						
						
					 
					
						2021-07-04 10:57:53 +02:00 
						 
				 
			
				
					
						
							
							
								eikek 
							
						 
					 
					
						
						
							
						
						bd791b4593 
					 
					
						
						
							
							Upgrade code base to CE3  
						
						
						
						
					 
					
						2021-06-22 22:53:34 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						e1bbc2edf5 
					 
					
						
						
							
							Apply autoformat  
						
						
						
						
					 
					
						2021-04-10 16:31:58 +02:00 
						 
				 
			
				
					
						
							
							
								Scala Steward 
							
						 
					 
					
						
						
							
						
						144ea852bf 
					 
					
						
						
							
							Update fs2-core, fs2-io to 2.5.4  
						
						
						
						
					 
					
						2021-03-31 21:10:42 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						6a63694a3e 
					 
					
						
						
							
							Convert unit tests to munit  
						
						
						
						
					 
					
						2021-03-10 19:48:56 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						9991ad5fcc 
					 
					
						
						
							
							Add latvian language  
						
						
						
						
					 
					
						2021-03-09 00:23:17 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						e6d9ce2c37 
					 
					
						
						
							
							Remove obsolete type capabilities  
						
						... 
						
						
						
						These are now detected by the new scala compiler and lead to compile
errors. 
						
						
					 
					
						2021-03-01 00:16:30 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c7d4c77e6d 
					 
					
						
						
							
							Allow more suggestions for date variants in English  
						
						
						
						
					 
					
						2021-02-26 00:35:17 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c7e850116f 
					 
					
						
						
							
							Make the text length limit optional  
						
						
						
						
					 
					
						2021-01-22 23:06:50 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						249f9e6e2a 
					 
					
						
						
							
							Extend guessing tags to all tag categories  
						
						
						
						
					 
					
						2021-01-18 21:51:45 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3f75af0807 
					 
					
						
						
							
							Add 9 more lanugages to the list of document lanugages  
						
						
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						26dff18ae0 
					 
					
						
						
							
							Add spanish as an example  
						
						... 
						
						
						
						Adding a new language without nlp requires now only to fill out the
pieces:
- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client 
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						ff121d462c 
					 
					
						
						
							
							Disable memory intensive tests on travis  
						
						
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f01646aeb5 
					 
					
						
						
							
							Reorganize nlp pipeline and add nlp-unsupported language italian  
						
						... 
						
						
						
						Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing. 
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						aa937797be 
					 
					
						
						
							
							Choose nlp mode in config file  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						54a09861c4 
					 
					
						
						
							
							Use model cache with basic annotator  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a77f67d73a 
					 
					
						
						
							
							Make pipeline cache generic to be used with BasicCRFAnnotator  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						4462ebae0f 
					 
					
						
						
							
							Resurrect the basic ner classifier  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a699e87304 
					 
					
						
						
							
							Separate ner from classification  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f02f15e5bd 
					 
					
						
						
							
							Move blocker into constructor of text analyser  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						b2b8ad625a 
					 
					
						
						
							
							scalafmt  
						
						
						
						
					 
					
						2021-01-17 20:11:58 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						75986c461f 
					 
					
						
						
							
							Fix ner date label boundary reporting  
						
						
						
						
					 
					
						2021-01-10 09:10:39 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						fb05e997ab 
					 
					
						
						
							
							Provide multiple date suggestions for English  
						
						... 
						
						
						
						Issue: #561  
						
						
					 
					
						2021-01-10 09:02:26 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						716252721c 
					 
					
						
						
							
							Fix cache clearing  
						
						... 
						
						
						
						It must be cancelled when obtaining a pipeline. 
						
						
					 
					
						2021-01-07 23:31:01 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a670bbb6c2 
					 
					
						
						
							
							Make idle interval when clearing nlp cache configurable  
						
						
						
						
					 
					
						2021-01-06 23:03:00 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						73a9572835 
					 
					
						
						
							
							Poc for clearing stanford pipeline after some idle time  
						
						
						
						
					 
					
						2021-01-05 23:56:20 +01:00 
						 
				 
			
				
					
						
							
							
								Tammo van Lessen 
							
						 
					 
					
						
						
							
						
						e9347176bd 
					 
					
						
						
							
							Fixes an off-by-one classic to also accept dates in January  
						
						
						
						
					 
					
						2020-11-28 00:43:35 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						cf6e63785d 
					 
					
						
						
							
							Fix potential index-out-of-bounds error in classifier  
						
						... 
						
						
						
						The stanford library expects a non-empty text. 
						
						
					 
					
						2020-11-09 00:04:51 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3f697f51aa 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-10-06 23:31:09 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						53c8d3031d 
					 
					
						
						
							
							Skip invalid dates find in texts  
						
						... 
						
						
						
						Fixes : #298  
					
						2020-10-02 22:37:15 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c658677032 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-09-09 00:29:32 +02:00