Commit Graph

44 Commits

Author SHA1 Message Date
eikek
9013f2de5b Update scalafmt settings 2021-09-22 17:23:24 +02:00
eikek
9785db0683 Change license header of all files 2021-09-21 22:35:38 +02:00
eikek
637f11d0f6 Fix solr setup by adding a text_he field
This field is used for Hebrew language. Solr doesn't support it out of
the box. The new field type is just a very basic field using the
standard tokenizer and lowercase filter. It is very likely not
providing good results. Hebrew is really difficult and it requires at
least installing plugins for solr - this is out of scope for docspell.
Users can setup their solr however they like and run a re-index
afterwards.
2021-08-28 00:10:36 +02:00
wallace
589c41003f Add hebrew document language 2021-08-24 01:19:42 +03:00
Scala Steward
e4fecefaea
Reformat with scalafmt 3.0.0 2021-08-19 08:50:30 +02:00
eikek
c59d4f8a6d Add the japanese content field to solr
This is a follow up on #961. It was forgotten when the japanese
language was added.
2021-07-29 22:22:34 +02:00
eikek
8e5c88fd32 Add copyright header to source files 2021-07-04 10:57:53 +02:00
eikek
bd791b4593 Upgrade code base to CE3 2021-06-22 22:53:34 +02:00
eikek
ac7d00c28f Refactor re-index task 2021-06-07 21:17:29 +02:00
eikek
5205ee0623 Store solr migration state in a solr document 2021-06-07 17:53:37 +02:00
Eike Kettner
ebaa31898e Add missing solr migration for new language field 2021-03-12 00:16:00 +01:00
Eike Kettner
3f75af0807 Add 9 more lanugages to the list of document lanugages 2021-01-18 17:41:40 +01:00
Eike Kettner
94bb18c152 Refactor solr language fields 2021-01-18 17:41:40 +01:00
Eike Kettner
26dff18ae0 Add spanish as an example
Adding a new language without nlp requires now only to fill out the
pieces:

- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
2021-01-18 17:41:40 +01:00
Eike Kettner
360cad3304 Refactoring solr/fts migration
When re-indexing everything, skip intermediate populating the index
and do this as the very last step.

Parameterize adding new fields by their language.
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
9c82f186d0 Add missing solr migration for french 2020-09-09 21:39:23 +02:00
Eike Kettner
fdb46da26d Add french language and upgrade stanford-nlp to 4.0.0 2020-08-23 17:48:42 +02:00
Eike Kettner
259526a088 Organize imports 2020-07-12 13:51:52 +02:00
Eike Kettner
22fa1dba13 Apply folder restriction to fulltext only search
And update index when folder changes.
2020-07-12 13:50:45 +02:00
Eike Kettner
aeba4ba913 Refactor full-text migrations and add folder to solr schema 2020-07-12 13:50:14 +02:00
Eike Kettner
347a029af8 Scalafix organize-imports 2020-06-28 21:20:47 +02:00
Eike Kettner
dc8f1a0387 Fix global re-index task to re-create the schema
Otherwise new instances could not be re-indexed.
2020-06-25 23:02:06 +02:00
Eike Kettner
0ba1736bc8 Remove items/attachments from index on delete 2020-06-25 00:00:10 +02:00
Eike Kettner
14213c4c27 Allow some solr query options in the config file 2020-06-24 23:37:20 +02:00
Eike Kettner
532caed84c Consistent logging of request/responses to solr
Using a middleware. Also add missing changesets for mariadb.
2020-06-24 21:25:46 +02:00
Eike Kettner
47697a8056 Set some logs to trace 2020-06-24 01:16:13 +02:00
Eike Kettner
7d7460b1c9 Cleanup + hiding false errors from log 2020-06-24 00:23:22 +02:00
Eike Kettner
d5c9923a6d Add a route that only searches the full-text index
It returns the results in the same order as received from the index to
preserve the relevance ordering.
2020-06-24 00:03:17 +02:00
Eike Kettner
e06a3f8fdd ScalafmtAll 2020-06-23 00:18:59 +02:00
Eike Kettner
ffbb16db45 Transport highlighting information to the client 2020-06-23 00:17:29 +02:00
Eike Kettner
a58ffd11e1 Return attachment-name from index 2020-06-22 21:28:26 +02:00
Eike Kettner
3d82e03a8a Remove solr query from debug log 2020-06-21 22:29:45 +02:00
Eike Kettner
cfe5aa8894 Use no-op fts-client if disabled + push this flag to the webui 2020-06-21 21:06:08 +02:00
Eike Kettner
14ea4091c4 Renaming things 2020-06-21 13:15:02 +02:00
Eike Kettner
9acea8307d Update full-text index when changing data 2020-06-21 00:33:39 +02:00
Eike Kettner
383614f908 Allow updating single fields in solr 2020-06-20 23:37:47 +02:00
Eike Kettner
1f4ff0d4c4 Add language to schema, extend fts-client 2020-06-20 22:44:47 +02:00
Eike Kettner
3576c45d1a First basic working solr search 2020-06-20 02:18:49 +02:00
Eike Kettner
2a0bf24088 Setup solr schema and index all data using a system task
The task runs on application start. It sets the schema using solr's
schema api and then indexes all data in the database. Each step is
memorized so that it is not executed again on subsequent starts.
2020-06-19 21:37:22 +02:00
Eike Kettner
1f4220eccb Index exsiting data in solr 2020-06-19 00:43:35 +02:00
Eike Kettner
60c079f664 Add task to index current database state 2020-06-18 22:38:45 +02:00
Eike Kettner
522daaf57e Introducing fts client into codebase 2020-06-17 23:20:46 +02:00
Eike Kettner
c7f598e3b0 Initial module setup 2020-06-17 23:20:46 +02:00