Commit Graph

18 Commits

Author SHA1 Message Date
61d5585e68 Add Ukrainian language 2022-11-09 22:24:32 +01:00
c0feb13f63 Add Estonian language
Closes: #1646
2022-11-01 01:00:16 +01:00
5ec311c331 Add polish to processing lanugages
SOLR doesn't support polish out of the box. Plugins are required for
polish. The language has been added only with basic support. For
better results, a manual setup of solr is required.

Closes: #1345
2022-05-21 14:41:16 +02:00
9d69401fea Add Lithuanian to processing languages
SOLR doesn't support Lithuanian, maybe it can be added via plugins. A
manual setup of solr is required then. It has been added with basic
support.

Closes: #1540
2022-05-21 14:36:01 +02:00
501c6f2988 Updating stanford corenlp to 4.3.2; adding more languages
There are models for Spanish, that have been added now. Also the
Hungarian language has been added to the list of supported
languages (for tesseract mainly, no nlp models)
2021-11-20 14:31:39 +01:00
9785db0683 Change license header of all files 2021-09-21 22:35:38 +02:00
589c41003f Add hebrew document language 2021-08-24 01:19:42 +03:00
f994d4b248 Add japanese document language 2021-07-28 20:05:48 +02:00
21eb7dad94 Change headers of all elm files 2021-07-25 14:00:11 +02:00
8e5c88fd32 Add copyright header to source files 2021-07-04 10:57:53 +02:00
e76d574ea3 Externalize strings for document language 2021-04-02 23:30:51 +02:00
9991ad5fcc Add latvian language 2021-03-09 00:23:17 +01:00
3f75af0807 Add 9 more lanugages to the list of document lanugages 2021-01-18 17:41:40 +01:00
26dff18ae0 Add spanish as an example
Adding a new language without nlp requires now only to fill out the
pieces:

- define a list of month names to support date recognition
- add it to joex' dockerfile to be available for tesseract
- update the solr migration/field definitions
- update the elm file so it shows up on the client
2021-01-18 17:41:40 +01:00
f01646aeb5 Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can
choose from many options, depending on their hardware and usage
scenario.

This is the base to use more languages without depending on what
stanford-nlp supports. Support then is involves to text extraction and
simple regex-ner processing.
2021-01-18 17:41:40 +01:00
fdb46da26d Add french language and upgrade stanford-nlp to 4.0.0 2020-08-23 17:48:42 +02:00
2001cca88b Using elm-format for all files 2019-12-29 21:55:12 +01:00
831cd8b655 Initial version.
Features:

- Upload PDF files let them analyze

- Manage meta data and items

- See processing in webapp
2019-09-21 22:02:36 +02:00