Eike Kettner 
							
						 
					 
					
						
						
							
						
						290989f67f 
					 
					
						
						
							
							Reorder correspondent person suggestion based on org relationship  
						
						
						
						
					 
					
						2020-12-01 23:39:45 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3fabe0a582 
					 
					
						
						
							
							Update to Scala 2.13.4  
						
						
						
						
					 
					
						2020-11-27 20:26:24 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5fe532001b 
					 
					
						
						
							
							Allow to specify document lanugage with the request  
						
						
						
						
					 
					
						2020-11-23 20:49:01 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5034e12bec 
					 
					
						
						
							
							Add a subject filter to scan-mailbox args  
						
						
						
						
					 
					
						2020-11-13 23:15:20 +01:00 
						 
				 
			
				
					
						
							
							
								mergify[bot] 
							
						 
					 
					
						
						
							
						
						e5ce1fd45f 
					 
					
						
						
							
							Merge pull request  #437  from eikek/upload-improvements  
						
						... 
						
						
						
						Upload improvements 
						
						
					 
					
						2020-11-12 22:58:08 +00:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						4fd6e02ec0 
					 
					
						
						
							
							Improve glob and filter archive entries  
						
						
						
						
					 
					
						2020-11-11 21:01:23 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						27eb5d70de 
					 
					
						
						
							
							Apply given tags in processing step  
						
						... 
						
						
						
						Issue: #346  
						
						
					 
					
						2020-11-11 21:01:23 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						55a6f7aaf6 
					 
					
						
						
							
							Add more properties to upload meta data  
						
						
						
						
					 
					
						2020-11-11 21:01:23 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						746e04c624 
					 
					
						
						
							
							Improve logging when creating preview images  
						
						
						
						
					 
					
						2020-11-10 22:25:46 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						10305bc82d 
					 
					
						
						
							
							Minor improvements  
						
						
						
						
					 
					
						2020-11-09 21:16:53 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						29455d638c 
					 
					
						
						
							
							Add startup task to find page counts of existing files  
						
						
						
						
					 
					
						2020-11-09 20:35:35 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						a77f34b7ba 
					 
					
						
						
							
							Add a processing step to retrieve page counts  
						
						
						
						
					 
					
						2020-11-09 11:08:24 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f4e50c5229 
					 
					
						
						
							
							Provide endpoints to submit tasks to re-generate previews  
						
						... 
						
						
						
						The scaling factor can be given in the config file. When this changes,
images can be regenerated via POSTing to certain endpoints. It is
possible to regenerate just one attachment preview or all within a
collective. 
						
						
					 
					
						2020-11-09 09:00:02 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						6037b54959 
					 
					
						
						
							
							Don't fail processing if generating preview fails  
						
						
						
						
					 
					
						2020-11-09 00:05:11 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						709848244c 
					 
					
						
						
							
							Create tasks to generate all previews  
						
						... 
						
						
						
						There is a task to generate preview images per attachment. It can
either add them (if not present yet) or overwrite them (e.g. some
config has changed).
There is a task that selects all attachments without previews and
submits a task to create it. This is submitted on start automatically
to generate previews for all existing attachments. 
						
						
					 
					
						2020-11-08 23:46:02 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						7ba6baf6f0 
					 
					
						
						
							
							Make preview image smaller  
						
						
						
						
					 
					
						2020-11-08 15:12:56 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						6db5c39d78 
					 
					
						
						
							
							Fix converted filename  
						
						... 
						
						
						
						Mark it by default with a string from the config file.
Issue: 397 
						
						
					 
					
						2020-11-08 09:45:03 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						ef7cb4e779 
					 
					
						
						
							
							Create a preview image of all files during processing  
						
						
						
						
					 
					
						2020-11-08 01:25:59 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						ab1139523a 
					 
					
						
						
							
							Let the convert-all task retry when pdf conversion fails  
						
						
						
						
					 
					
						2020-10-26 23:39:26 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						b59696a9d3 
					 
					
						
						
							
							Make sure to only remove/retry items in premature states  
						
						
						
						
					 
					
						2020-10-26 23:39:26 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						26e89bf84e 
					 
					
						
						
							
							Edit org/person/equipment of multiple items  
						
						
						
						
					 
					
						2020-10-26 13:35:47 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						2e6026b817 
					 
					
						
						
							
							Edit dates of multiple items  
						
						
						
						
					 
					
						2020-10-26 13:16:03 +01:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3c0b86cb19 
					 
					
						
						
							
							Fix regex patterns used for NER  
						
						... 
						
						
						
						Patterns are split on whitespace by the nlp library and then compiled,
so each "word" must be a valid regex.
Fixes : #356  
						
						
					 
					
						2020-10-21 00:55:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3f697f51aa 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-10-06 23:31:09 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						d4354b8b49 
					 
					
						
						
							
							Skip pdf conversion if a converted file exists  
						
						... 
						
						
						
						For images the conversion also returns the extracted text. If this
would have failed to be saved, it is extracted in the following
text-extraction step. 
						
						
					 
					
						2020-10-02 17:39:39 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						b6f23b038a 
					 
					
						
						
							
							Fix finding attachments for retries  
						
						... 
						
						
						
						The attachments to process again must be searched in sources and
archives, too. 
						
						
					 
					
						2020-10-02 17:39:34 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5e21552358 
					 
					
						
						
							
							Don't do duplicate check on retries  
						
						
						
						
					 
					
						2020-10-02 16:50:52 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						f6f63000be 
					 
					
						
						
							
							Prepend a duplicate check when uploading files  
						
						
						
						
					 
					
						2020-09-23 23:37:00 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						c658677032 
					 
					
						
						
							
							Autoformat  
						
						
						
						
					 
					
						2020-09-09 00:29:32 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						76ccfb8a81 
					 
					
						
						
							
							Only learn from confirmed items  
						
						... 
						
						
						
						Text classification should only learn from confirmed items. Log if
classification is disabled when processing an item. 
						
						
					 
					
						2020-09-07 13:04:40 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						4309bd8dfd 
					 
					
						
						
							
							Some cleanup  
						
						
						
						
					 
					
						2020-09-02 21:22:30 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						237b960625 
					 
					
						
						
							
							Guess a tag on item processing using a trained model if available  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						316b490008 
					 
					
						
						
							
							Implement learning a text classifier from collective data  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						68bb65572b 
					 
					
						
						
							
							Integrate learn-classifier task into the app  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						0c97b4ef76 
					 
					
						
						
							
							Initial impl of a text classifier based on stanford-nlp  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						8c4f2e702b 
					 
					
						
						
							
							Add classifier settings  
						
						
						
						
					 
					
						2020-09-02 18:28:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3473cbb773 
					 
					
						
						
							
							Use collective data with NER annotation  
						
						
						
						
					 
					
						2020-08-25 20:40:44 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						96d2f948f2 
					 
					
						
						
							
							Use collective's addressbook to configure regexner  
						
						
						
						
					 
					
						2020-08-24 14:40:52 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						8628a0a8b3 
					 
					
						
						
							
							Allow configuring stanford-ner and cache based on collective  
						
						
						
						
					 
					
						2020-08-24 10:55:59 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3986487f11 
					 
					
						
						
							
							Add api docs and cleanup  
						
						
						
						
					 
					
						2020-08-13 21:22:54 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						41ea071555 
					 
					
						
						
							
							Add a task to convert all pdfs that have not been converted  
						
						
						
						
					 
					
						2020-08-13 01:06:13 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						07e9a9767e 
					 
					
						
						
							
							Add a task to re-process files of an item  
						
						
						
						
					 
					
						2020-08-12 22:29:56 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						09d74b7e80 
					 
					
						
						
							
							Return item notes with search results  
						
						... 
						
						
						
						In order to not make the response very large, a admin can define a
limit on how much to return. 
						
						
					 
					
						2020-08-05 00:09:37 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						45b0deeced 
					 
					
						
						
							
							Print solr url on start  
						
						... 
						
						
						
						This is useful info to see which url has been selected, same as db
connection. 
						
						
					 
					
						2020-08-01 15:59:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						1fc57fc2b2 
					 
					
						
						
							
							Set default value for min-text-len to 500  
						
						... 
						
						
						
						This value is used to decide whether to try OCR or not. If text is
below this value, OCR is run and both results are compared. It was set
to 10, which is just one or two words. Since the context for docspell
are documents, this value is too low. 
						
						
					 
					
						2020-08-01 15:46:00 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						cec4948710 
					 
					
						
						
							
							Add pdf meta data to extracted text to add it to full-text index  
						
						
						
						
					 
					
						2020-07-19 01:07:49 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						209c068436 
					 
					
						
						
							
							Use keywords in pdfs to search for existing tags  
						
						... 
						
						
						
						During processing, keywords stored in PDF metadata are used to look
them up in the tag database and associate any existing tags to the
item.
See #175  
						
						
					 
					
						2020-07-19 00:28:04 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						bd20165d1a 
					 
					
						
						
							
							Use given folder-id when adding initial fts docs  
						
						
						
						
					 
					
						2020-07-18 23:04:01 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						3d49ceaab5 
					 
					
						
						
							
							Use ocrmypdf tool to create pdf/a during conversion  
						
						... 
						
						
						
						- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf
- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.
- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.
- All errors during conversion are not fatal; processing continues
  without a converted file. 
						
						
					 
					
						2020-07-18 17:19:29 +02:00 
						 
				 
			
				
					
						
							
							
								Eike Kettner 
							
						 
					 
					
						
						
							
						
						5b01c93711 
					 
					
						
						
							
							Add a folder-id to item processing  
						
						... 
						
						
						
						This allows to define a folder when uploading files. All generated
items are associated to this folder on creation. 
						
						
					 
					
						2020-07-14 23:18:39 +02:00