docspell

mirror of https://github.com/TheAnachronism/docspell.git synced 2025-06-23 02:48:26 +00:00

Author	SHA1	Message	Date
eikek	868285a26b	Fix fulltext search queries for new collective-id	2022-08-07 16:28:22 +02:00
eikek	53d92c4a26	Adopt backend to collective-id	2022-08-07 16:26:04 +02:00
eikek	5ec311c331	Add polish to processing lanugages SOLR doesn't support polish out of the box. Plugins are required for polish. The language has been added only with basic support. For better results, a manual setup of solr is required. Closes: #1345	2022-05-21 14:41:16 +02:00
eikek	9d69401fea	Add Lithuanian to processing languages SOLR doesn't support Lithuanian, maybe it can be added via plugins. A manual setup of solr is required then. It has been added with basic support. Closes: #1540	2022-05-21 14:36:01 +02:00
eikek	029335e607	Working poc of postgresql based fulltext search backend	2022-03-21 11:04:26 +01:00
eikek	e483a97de7	Adopt to new loggin api	2022-02-19 21:41:38 +01:00
eikek	501c6f2988	Updating stanford corenlp to 4.3.2; adding more languages There are models for Spanish, that have been added now. Also the Hungarian language has been added to the list of supported languages (for tesseract mainly, no nlp models)	2021-11-20 14:31:39 +01:00
eikek	9013f2de5b	Update scalafmt settings	2021-09-22 17:23:24 +02:00
eikek	9785db0683	Change license header of all files	2021-09-21 22:35:38 +02:00
eikek	637f11d0f6	Fix solr setup by adding a text_he field This field is used for Hebrew language. Solr doesn't support it out of the box. The new field type is just a very basic field using the standard tokenizer and lowercase filter. It is very likely not providing good results. Hebrew is really difficult and it requires at least installing plugins for solr - this is out of scope for docspell. Users can setup their solr however they like and run a re-index afterwards.	2021-08-28 00:10:36 +02:00
wallace	589c41003f	Add hebrew document language	2021-08-24 01:19:42 +03:00
Scala Steward	e4fecefaea	Reformat with scalafmt 3.0.0	2021-08-19 08:50:30 +02:00
eikek	c59d4f8a6d	Add the japanese content field to solr This is a follow up on #961. It was forgotten when the japanese language was added.	2021-07-29 22:22:34 +02:00
eikek	8e5c88fd32	Add copyright header to source files	2021-07-04 10:57:53 +02:00
eikek	bd791b4593	Upgrade code base to CE3	2021-06-22 22:53:34 +02:00
eikek	ac7d00c28f	Refactor re-index task	2021-06-07 21:17:29 +02:00
eikek	5205ee0623	Store solr migration state in a solr document	2021-06-07 17:53:37 +02:00
Eike Kettner	ebaa31898e	Add missing solr migration for new language field	2021-03-12 00:16:00 +01:00
Eike Kettner	3f75af0807	Add 9 more lanugages to the list of document lanugages	2021-01-18 17:41:40 +01:00
Eike Kettner	94bb18c152	Refactor solr language fields	2021-01-18 17:41:40 +01:00
Eike Kettner	26dff18ae0	Add spanish as an example Adding a new language without nlp requires now only to fill out the pieces: - define a list of month names to support date recognition - add it to joex' dockerfile to be available for tesseract - update the solr migration/field definitions - update the elm file so it shows up on the client	2021-01-18 17:41:40 +01:00
Eike Kettner	360cad3304	Refactoring solr/fts migration When re-indexing everything, skip intermediate populating the index and do this as the very last step. Parameterize adding new fields by their language.	2021-01-18 17:41:40 +01:00
Eike Kettner	f01646aeb5	Reorganize nlp pipeline and add nlp-unsupported language italian Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.	2021-01-18 17:41:40 +01:00
Eike Kettner	9c82f186d0	Add missing solr migration for french	2020-09-09 21:39:23 +02:00
Eike Kettner	fdb46da26d	Add french language and upgrade stanford-nlp to 4.0.0	2020-08-23 17:48:42 +02:00
Eike Kettner	259526a088	Organize imports	2020-07-12 13:51:52 +02:00
Eike Kettner	22fa1dba13	Apply folder restriction to fulltext only search And update index when folder changes.	2020-07-12 13:50:45 +02:00
Eike Kettner	aeba4ba913	Refactor full-text migrations and add folder to solr schema	2020-07-12 13:50:14 +02:00
Eike Kettner	347a029af8	Scalafix organize-imports	2020-06-28 21:20:47 +02:00
Eike Kettner	dc8f1a0387	Fix global re-index task to re-create the schema Otherwise new instances could not be re-indexed.	2020-06-25 23:02:06 +02:00
Eike Kettner	0ba1736bc8	Remove items/attachments from index on delete	2020-06-25 00:00:10 +02:00
Eike Kettner	14213c4c27	Allow some solr query options in the config file	2020-06-24 23:37:20 +02:00
Eike Kettner	532caed84c	Consistent logging of request/responses to solr Using a middleware. Also add missing changesets for mariadb.	2020-06-24 21:25:46 +02:00
Eike Kettner	47697a8056	Set some logs to trace	2020-06-24 01:16:13 +02:00
Eike Kettner	7d7460b1c9	Cleanup + hiding false errors from log	2020-06-24 00:23:22 +02:00
Eike Kettner	d5c9923a6d	Add a route that only searches the full-text index It returns the results in the same order as received from the index to preserve the relevance ordering.	2020-06-24 00:03:17 +02:00
Eike Kettner	e06a3f8fdd	ScalafmtAll	2020-06-23 00:18:59 +02:00
Eike Kettner	ffbb16db45	Transport highlighting information to the client	2020-06-23 00:17:29 +02:00
Eike Kettner	a58ffd11e1	Return attachment-name from index	2020-06-22 21:28:26 +02:00
Eike Kettner	3d82e03a8a	Remove solr query from debug log	2020-06-21 22:29:45 +02:00
Eike Kettner	cfe5aa8894	Use no-op fts-client if disabled + push this flag to the webui	2020-06-21 21:06:08 +02:00
Eike Kettner	14ea4091c4	Renaming things	2020-06-21 13:15:02 +02:00
Eike Kettner	9acea8307d	Update full-text index when changing data	2020-06-21 00:33:39 +02:00
Eike Kettner	383614f908	Allow updating single fields in solr	2020-06-20 23:37:47 +02:00
Eike Kettner	1f4ff0d4c4	Add language to schema, extend fts-client	2020-06-20 22:44:47 +02:00
Eike Kettner	3576c45d1a	First basic working solr search	2020-06-20 02:18:49 +02:00
Eike Kettner	2a0bf24088	Setup solr schema and index all data using a system task The task runs on application start. It sets the schema using solr's schema api and then indexes all data in the database. Each step is memorized so that it is not executed again on subsequent starts.	2020-06-19 21:37:22 +02:00
Eike Kettner	1f4220eccb	Index exsiting data in solr	2020-06-19 00:43:35 +02:00
Eike Kettner	60c079f664	Add task to index current database state	2020-06-18 22:38:45 +02:00
Eike Kettner	522daaf57e	Introducing fts client into codebase	2020-06-17 23:20:46 +02:00

1 2

51 Commits