eikek
9013f2de5b
Update scalafmt settings
2021-09-22 17:23:24 +02:00
eikek
9785db0683
Change license header of all files
2021-09-21 22:35:38 +02:00
eikek
637f11d0f6
Fix solr setup by adding a text_he field
...
This field is used for Hebrew language. Solr doesn't support it out of
the box. The new field type is just a very basic field using the
standard tokenizer and lowercase filter. It is very likely not
providing good results. Hebrew is really difficult and it requires at
least installing plugins for solr - this is out of scope for docspell.
Users can setup their solr however they like and run a re-index
afterwards.
2021-08-28 00:10:36 +02:00
wallace
589c41003f
Add hebrew document language
2021-08-24 01:19:42 +03:00
Scala Steward
e4fecefaea
Reformat with scalafmt 3.0.0
2021-08-19 08:50:30 +02:00
eikek
c59d4f8a6d
Add the japanese content field to solr
...
This is a follow up on #961 . It was forgotten when the japanese
language was added.
2021-07-29 22:22:34 +02:00
eikek
8e5c88fd32
Add copyright header to source files
2021-07-04 10:57:53 +02:00
eikek
bd791b4593
Upgrade code base to CE3
2021-06-22 22:53:34 +02:00
eikek
ac7d00c28f
Refactor re-index task
2021-06-07 21:17:29 +02:00
eikek
5205ee0623
Store solr migration state in a solr document
2021-06-07 17:53:37 +02:00
Eike Kettner
ebaa31898e
Add missing solr migration for new language field
2021-03-12 00:16:00 +01:00
Eike Kettner
3f75af0807
Add 9 more lanugages to the list of document lanugages
2021-01-18 17:41:40 +01:00
Eike Kettner
94bb18c152
Refactor solr language fields
2021-01-18 17:41:40 +01:00
Eike Kettner
26dff18ae0
Add spanish as an example
...
Adding a new language without nlp requires now only to fill out the
pieces:
- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
2021-01-18 17:41:40 +01:00
Eike Kettner
360cad3304
Refactoring solr/fts migration
...
When re-indexing everything, skip intermediate populating the index
and do this as the very last step.
Parameterize adding new fields by their language.
2021-01-18 17:41:40 +01:00
Eike Kettner
f01646aeb5
Reorganize nlp pipeline and add nlp-unsupported language italian
...
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.
This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
Eike Kettner
9c82f186d0
Add missing solr migration for french
2020-09-09 21:39:23 +02:00
Eike Kettner
fdb46da26d
Add french language and upgrade stanford-nlp to 4.0.0
2020-08-23 17:48:42 +02:00
Eike Kettner
259526a088
Organize imports
2020-07-12 13:51:52 +02:00
Eike Kettner
22fa1dba13
Apply folder restriction to fulltext only search
...
And update index when folder changes.
2020-07-12 13:50:45 +02:00
Eike Kettner
aeba4ba913
Refactor full-text migrations and add folder to solr schema
2020-07-12 13:50:14 +02:00
Eike Kettner
347a029af8
Scalafix organize-imports
2020-06-28 21:20:47 +02:00
Eike Kettner
dc8f1a0387
Fix global re-index task to re-create the schema
...
Otherwise new instances could not be re-indexed.
2020-06-25 23:02:06 +02:00
Eike Kettner
0ba1736bc8
Remove items/attachments from index on delete
2020-06-25 00:00:10 +02:00
Eike Kettner
14213c4c27
Allow some solr query options in the config file
2020-06-24 23:37:20 +02:00
Eike Kettner
532caed84c
Consistent logging of request/responses to solr
...
Using a middleware. Also add missing changesets for mariadb.
2020-06-24 21:25:46 +02:00
Eike Kettner
47697a8056
Set some logs to trace
2020-06-24 01:16:13 +02:00
Eike Kettner
7d7460b1c9
Cleanup + hiding false errors from log
2020-06-24 00:23:22 +02:00
Eike Kettner
d5c9923a6d
Add a route that only searches the full-text index
...
It returns the results in the same order as received from the index to
preserve the relevance ordering.
2020-06-24 00:03:17 +02:00
Eike Kettner
e06a3f8fdd
ScalafmtAll
2020-06-23 00:18:59 +02:00
Eike Kettner
ffbb16db45
Transport highlighting information to the client
2020-06-23 00:17:29 +02:00
Eike Kettner
a58ffd11e1
Return attachment-name from index
2020-06-22 21:28:26 +02:00
Eike Kettner
3d82e03a8a
Remove solr query from debug log
2020-06-21 22:29:45 +02:00
Eike Kettner
cfe5aa8894
Use no-op fts-client if disabled + push this flag to the webui
2020-06-21 21:06:08 +02:00
Eike Kettner
14ea4091c4
Renaming things
2020-06-21 13:15:02 +02:00
Eike Kettner
9acea8307d
Update full-text index when changing data
2020-06-21 00:33:39 +02:00
Eike Kettner
383614f908
Allow updating single fields in solr
2020-06-20 23:37:47 +02:00
Eike Kettner
1f4ff0d4c4
Add language to schema, extend fts-client
2020-06-20 22:44:47 +02:00
Eike Kettner
3576c45d1a
First basic working solr search
2020-06-20 02:18:49 +02:00
Eike Kettner
2a0bf24088
Setup solr schema and index all data using a system task
...
The task runs on application start. It sets the schema using solr's
schema api and then indexes all data in the database. Each step is
memorized so that it is not executed again on subsequent starts.
2020-06-19 21:37:22 +02:00
Eike Kettner
1f4220eccb
Index exsiting data in solr
2020-06-19 00:43:35 +02:00
Eike Kettner
60c079f664
Add task to index current database state
2020-06-18 22:38:45 +02:00
Eike Kettner
522daaf57e
Introducing fts client into codebase
2020-06-17 23:20:46 +02:00
Eike Kettner
c7f598e3b0
Initial module setup
2020-06-17 23:20:46 +02:00