Eike Kettner 
							
						 
					 
					
						
						
							
						
						b59696a9d3 
					 
					
						
						
							
							Make sure to only remove/retry items in premature states  
						
						
						
						
					 
					
						2020-10-26 23:39:26 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						26e89bf84e 
					 
					
						
						
							
							Edit org/person/equipment of multiple items  
						
						
						
						
					 
					
						2020-10-26 13:35:47 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						2e6026b817 
					 
					
						
						
							
							Edit dates of multiple items  
						
						
						
						
					 
					
						2020-10-26 13:16:03 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3c0b86cb19 
					 
					
						
						
							
							Fix regex patterns used for NER  
						
						... 
						
						
						
						Patterns are split on whitespace by the nlp library and then compiled,
so each "word" must be a valid regex.
Fixes : #356  
						
						
					 
					
						2020-10-21 00:55:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3f697f51aa 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-10-06 23:31:09 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						d4354b8b49 
					 
					
						
						
							
							Skip pdf conversion if a converted file exists  
						
						... 
						
						
						
						For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step. 
						
						
					 
					
						2020-10-02 17:39:39 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						b6f23b038a 
					 
					
						
						
							
							Fix finding attachments for retries  
						
						... 
						
						
						
						The attachments to process again must be searched in sources and
archives, too. 
						
						
					 
					
						2020-10-02 17:39:34 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5e21552358 
					 
					
						
						
							
							Don't do duplicate check on retries  
						
						
						
						
					 
					
						2020-10-02 16:50:52 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f6f63000be 
					 
					
						
						
							
							Prepend a duplicate check when uploading files  
						
						
						
						
					 
					
						2020-09-23 23:37:00 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c658677032 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-09-09 00:29:32 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						76ccfb8a81 
					 
					
						
						
							
							Only learn from confirmed items  
						
						... 
						
						
						
						Text classification should only learn from confirmed items. Log if
classification is disabled when processing an item. 
						
						
					 
					
						2020-09-07 13:04:40 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						4309bd8dfd 
					 
					
						
						
							
							Some cleanup  
						
						
						
						
					 
					
						2020-09-02 21:22:30 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						237b960625 
					 
					
						
						
							
							Guess a tag on item processing using a trained model if available  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						316b490008 
					 
					
						
						
							
							Implement learning a text classifier from collective data  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						68bb65572b 
					 
					
						
						
							
							Integrate learn-classifier task into the app  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						0c97b4ef76 
					 
					
						
						
							
							Initial impl of a text classifier based on stanford-nlp  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						8c4f2e702b 
					 
					
						
						
							
							Add classifier settings  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3473cbb773 
					 
					
						
						
							
							Use collective data with NER annotation  
						
						
						
						
					 
					
						2020-08-25 20:40:44 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						96d2f948f2 
					 
					
						
						
							
							Use collective's addressbook to configure regexner  
						
						
						
						
					 
					
						2020-08-24 14:40:52 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						8628a0a8b3 
					 
					
						
						
							
							Allow configuring stanford-ner and cache based on collective  
						
						
						
						
					 
					
						2020-08-24 10:55:59 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3986487f11 
					 
					
						
						
							
							Add api docs and cleanup  
						
						
						
						
					 
					
						2020-08-13 21:22:54 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						41ea071555 
					 
					
						
						
							
							Add a task to convert all pdfs that have not been converted  
						
						
						
						
					 
					
						2020-08-13 01:06:13 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						07e9a9767e 
					 
					
						
						
							
							Add a task to re-process files of an item  
						
						
						
						
					 
					
						2020-08-12 22:29:56 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						09d74b7e80 
					 
					
						
						
							
							Return item notes with search results  
						
						... 
						
						
						
						In order to not make the response very large, a admin can define a
limit on how much to return. 
						
						
					 
					
						2020-08-05 00:09:37 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						45b0deeced 
					 
					
						
						
							
							Print solr url on start  
						
						... 
						
						
						
						This is useful info to see which url has been selected, same as db
connection. 
						
						
					 
					
						2020-08-01 15:59:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						1fc57fc2b2 
					 
					
						
						
							
							Set default value for min-text-len to 500  
						
						... 
						
						
						
						This value is used to decide whether to try OCR or not. If text is
below this value, OCR is run and both results are compared. It was set
to 10, which is just one or two words. Since the context for docspell
are documents, this value is too low. 
						
						
					 
					
						2020-08-01 15:46:00 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						cec4948710 
					 
					
						
						
							
							Add pdf meta data to extracted text to add it to full-text index  
						
						
						
						
					 
					
						2020-07-19 01:07:49 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						209c068436 
					 
					
						
						
							
							Use keywords in pdfs to search for existing tags  
						
						... 
						
						
						
						During processing, keywords stored in PDF metadata are used to look
them up in the tag database and associate any existing tags to the
item.
See #175  
						
						
					 
					
						2020-07-19 00:28:04 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						bd20165d1a 
					 
					
						
						
							
							Use given folder-id when adding initial fts docs  
						
						
						
						
					 
					
						2020-07-18 23:04:01 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3d49ceaab5 
					 
					
						
						
							
							Use ocrmypdf tool to create pdf/a during conversion  
						
						... 
						
						
						
						- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf
- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.
- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.
- All errors during conversion are not fatal; processing continues
  without a converted file. 
						
						
					 
					
						2020-07-18 17:19:29 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5b01c93711 
					 
					
						
						
							
							Add a folder-id to item processing  
						
						... 
						
						
						
						This allows to define a folder when uploading files. All generated
items are associated to this folder on creation. 
						
						
					 
					
						2020-07-14 23:18:39 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						259526a088 
					 
					
						
						
							
							Organize imports  
						
						
						
						
					 
					
						2020-07-12 13:51:52 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						22fa1dba13 
					 
					
						
						
							
							Apply folder restriction to fulltext only search  
						
						... 
						
						
						
						And update index when folder changes. 
						
						
					 
					
						2020-07-12 13:50:45 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						aeba4ba913 
					 
					
						
						
							
							Refactor full-text migrations and add folder to solr schema  
						
						
						
						
					 
					
						2020-07-12 13:50:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						e387b5513f 
					 
					
						
						
							
							Remove items in non-member folders from sql search results  
						
						
						
						
					 
					
						2020-07-11 22:25:56 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						752a94a9e2 
					 
					
						
						
							
							Implement space operations  
						
						
						
						
					 
					
						2020-07-11 01:30:28 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						347a029af8 
					 
					
						
						
							
							Scalafix organize-imports  
						
						
						
						
					 
					
						2020-06-28 21:20:47 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						41c0f70d3b 
					 
					
						
						
							
							Fix cancelling jobs  
						
						... 
						
						
						
						A request to cancel a job was not processed correctly. The cancelling
routine of a task must run, regardless of the (non-final) state. Now
it works like this: if a job is currently running, it is interrupted
and its cancel routine is invoked. It then enters "cancelled" state.
If it is stuck, it is loaded and only its cancel routine is run. If it
is in a final state or waiting, it is removed from the queue. 
						
						
					 
					
						2020-06-26 23:08:27 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						d79ae6233a 
					 
					
						
						
							
							Restrict proposals for due date  
						
						... 
						
						
						
						Avoid dates too far in the future. 
						
						
					 
					
						2020-06-26 16:58:17 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						91da3b149e 
					 
					
						
						
							
							Reducing default retries to 2  
						
						... 
						
						
						
						Many errors cannot be recovered from by retrying. There is currently
no way to distinguish these states so it is now set to a lower value
to have not long wait times until an item arrives. 
						
						
					 
					
						2020-06-25 23:57:01 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						dc8f1a0387 
					 
					
						
						
							
							Fix global re-index task to re-create the schema  
						
						... 
						
						
						
						Otherwise new instances could not be re-indexed. 
						
						
					 
					
						2020-06-25 23:02:06 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						14213c4c27 
					 
					
						
						
							
							Allow some solr query options in the config file  
						
						
						
						
					 
					
						2020-06-24 23:37:20 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						532caed84c 
					 
					
						
						
							
							Consistent logging of request/responses to solr  
						
						... 
						
						
						
						Using a middleware. Also add missing changesets for mariadb. 
						
						
					 
					
						2020-06-24 21:25:46 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						47697a8056 
					 
					
						
						
							
							Set some logs to trace  
						
						
						
						
					 
					
						2020-06-24 01:16:13 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						e06a3f8fdd 
					 
					
						
						
							
							ScalafmtAll  
						
						
						
						
					 
					
						2020-06-23 00:18:59 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						ffbb16db45 
					 
					
						
						
							
							Transport highlighting information to the client  
						
						
						
						
					 
					
						2020-06-23 00:17:29 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						cfe5aa8894 
					 
					
						
						
							
							Use no-op fts-client if disabled + push this flag to the webui  
						
						
						
						
					 
					
						2020-06-21 21:06:08 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						0d8b03fc61 
					 
					
						
						
							
							Add backend operations for re-creating the full-text index  
						
						
						
						
					 
					
						2020-06-21 15:46:51 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						14ea4091c4 
					 
					
						
						
							
							Renaming things  
						
						
						
						
					 
					
						2020-06-21 13:15:02 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						2f6e531c45 
					 
					
						
						
							
							Refactoring index migration task  
						
						
						
						
					 
					
						2020-06-21 01:37:23 +02:00