Eike Kettner 
							
						 
					 
					
						
						
							
						
						e6d9ce2c37 
					 
					
						
						
							
							Remove obsolete type capabilities  
						
						... 
						
						
						
						These are now detected by the new scala compiler and lead to compile
errors. 
						
						
					 
					
						2021-03-01 00:16:30 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c7d4c77e6d 
					 
					
						
						
							
							Allow more suggestions for date variants in English  
						
						
						
						
					 
					
						2021-02-26 00:35:17 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c7e850116f 
					 
					
						
						
							
							Make the text length limit optional  
						
						
						
						
					 
					
						2021-01-22 23:06:50 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						249f9e6e2a 
					 
					
						
						
							
							Extend guessing tags to all tag categories  
						
						
						
						
					 
					
						2021-01-18 21:51:45 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3f75af0807 
					 
					
						
						
							
							Add 9 more lanugages to the list of document lanugages  
						
						
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						26dff18ae0 
					 
					
						
						
							
							Add spanish as an example  
						
						... 
						
						
						
						Adding a new language without nlp requires now only to fill out the
pieces:
- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client 
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						ff121d462c 
					 
					
						
						
							
							Disable memory intensive tests on travis  
						
						
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f01646aeb5 
					 
					
						
						
							
							Reorganize nlp pipeline and add nlp-unsupported language italian  
						
						... 
						
						
						
						Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing. 
						
						
					 
					
						2021-01-18 17:41:40 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						aa937797be 
					 
					
						
						
							
							Choose nlp mode in config file  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						54a09861c4 
					 
					
						
						
							
							Use model cache with basic annotator  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a77f67d73a 
					 
					
						
						
							
							Make pipeline cache generic to be used with BasicCRFAnnotator  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						4462ebae0f 
					 
					
						
						
							
							Resurrect the basic ner classifier  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a699e87304 
					 
					
						
						
							
							Separate ner from classification  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f02f15e5bd 
					 
					
						
						
							
							Move blocker into constructor of text analyser  
						
						
						
						
					 
					
						2021-01-17 22:56:33 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						b2b8ad625a 
					 
					
						
						
							
							scalafmt  
						
						
						
						
					 
					
						2021-01-17 20:11:58 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						75986c461f 
					 
					
						
						
							
							Fix ner date label boundary reporting  
						
						
						
						
					 
					
						2021-01-10 09:10:39 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						fb05e997ab 
					 
					
						
						
							
							Provide multiple date suggestions for English  
						
						... 
						
						
						
						Issue: #561  
						
						
					 
					
						2021-01-10 09:02:26 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						716252721c 
					 
					
						
						
							
							Fix cache clearing  
						
						... 
						
						
						
						It must be cancelled when obtaining a pipeline. 
						
						
					 
					
						2021-01-07 23:31:01 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a670bbb6c2 
					 
					
						
						
							
							Make idle interval when clearing nlp cache configurable  
						
						
						
						
					 
					
						2021-01-06 23:03:00 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						73a9572835 
					 
					
						
						
							
							Poc for clearing stanford pipeline after some idle time  
						
						
						
						
					 
					
						2021-01-05 23:56:20 +01:00 
						 
				 
			
				
					
						
							
							
								Tammo van Lessen 
							
						 
					 
					
						
						
							
						
						e9347176bd 
					 
					
						
						
							
							Fixes an off-by-one classic to also accept dates in January  
						
						
						
						
					 
					
						2020-11-28 00:43:35 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						cf6e63785d 
					 
					
						
						
							
							Fix potential index-out-of-bounds error in classifier  
						
						... 
						
						
						
						The stanford library expects a non-empty text. 
						
						
					 
					
						2020-11-09 00:04:51 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3f697f51aa 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-10-06 23:31:09 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						53c8d3031d 
					 
					
						
						
							
							Skip invalid dates find in texts  
						
						... 
						
						
						
						Fixes : #298  
					
						2020-10-02 22:37:15 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c658677032 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-09-09 00:29:32 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						97757876d5 
					 
					
						
						
							
							Fix formatting  
						
						
						
						
					 
					
						2020-09-08 00:47:42 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c9bd57592b 
					 
					
						
						
							
							Don't use test data if there is just one config  
						
						... 
						
						
						
						If classifier models cannot be compared, there is no reason to test. 
						
						
					 
					
						2020-09-07 20:02:50 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						316b490008 
					 
					
						
						
							
							Implement learning a text classifier from collective data  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						0c97b4ef76 
					 
					
						
						
							
							Initial impl of a text classifier based on stanford-nlp  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						96d2f948f2 
					 
					
						
						
							
							Use collective's addressbook to configure regexner  
						
						
						
						
					 
					
						2020-08-24 14:40:52 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						8628a0a8b3 
					 
					
						
						
							
							Allow configuring stanford-ner and cache based on collective  
						
						
						
						
					 
					
						2020-08-24 10:55:59 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						fdb46da26d 
					 
					
						
						
							
							Add french language and upgrade stanford-nlp to 4.0.0  
						
						
						
						
					 
					
						2020-08-23 17:48:42 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						347a029af8 
					 
					
						
						
							
							Scalafix organize-imports  
						
						
						
						
					 
					
						2020-06-28 21:20:47 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						897d91475e 
					 
					
						
						
							
							Update scalafmt-core to 2.6.0  
						
						
						
						
					 
					
						2020-06-17 19:53:56 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						075b665c68 
					 
					
						
						
							
							Add some more tlds to look for  
						
						
						
						
					 
					
						2020-05-24 11:48:49 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5e6ce1737c 
					 
					
						
						
							
							Change recognizing dates with short years  
						
						... 
						
						
						
						Short years are now added to the current centure (2000) such that date
strings like 12/26/11 result in 12/26/2011 and not 12/26/1911. 
						
						
					 
					
						2020-05-17 11:58:51 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c41cdeefec 
					 
					
						
						
							
							Update scalafmt to 2.5.1 + scalafmtAll  
						
						
						
						
					 
					
						2020-05-04 23:53:57 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						6a1297fc95 
					 
					
						
						
							
							Add a limit for text analysis  
						
						
						
						
					 
					
						2020-03-27 22:54:49 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						9656ba62f4 
					 
					
						
						
							
							scalafmtAll  
						
						
						
						
					 
					
						2020-03-26 18:26:00 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						2f87065b2e 
					 
					
						
						
							
							sbt scalafmtAll  
						
						
						
						
					 
					
						2020-02-25 20:55:00 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						8143a4edcc 
					 
					
						
						
							
							Adding extraction primitives  
						
						
						
						
					 
					
						2020-02-16 21:37:26 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						851ee7ef0f 
					 
					
						
						
							
							Reorganize processing code  
						
						... 
						
						
						
						Use separate modules for
- text extraction
- conversion to pdf
- text analysis 
						
						
					 
					
						2020-02-15 21:25:25 +01:00