mirror of
https://github.com/TheAnachronism/docspell.git
synced 2025-06-22 02:18:26 +00:00
Reorganize nlp pipeline and add nlp-unsupported language italian
Improves and reorganizes how nlp pipelines are setup. Now users can choose from many options, depending on their hardware and usage scenario. This is the base to use more languages without depending on what stanford-nlp supports. Support then is involves to text extraction and simple regex-ner processing.
This commit is contained in:
@ -11,6 +11,7 @@ type Language
|
||||
= German
|
||||
| English
|
||||
| French
|
||||
| Italian
|
||||
|
||||
|
||||
fromString : String -> Maybe Language
|
||||
@ -24,6 +25,8 @@ fromString str =
|
||||
else if str == "fra" || str == "fr" || str == "french" then
|
||||
Just French
|
||||
|
||||
else if str == "ita" || str == "it" || str == "italian" then
|
||||
Just Italian
|
||||
else
|
||||
Nothing
|
||||
|
||||
@ -40,6 +43,9 @@ toIso3 lang =
|
||||
French ->
|
||||
"fra"
|
||||
|
||||
Italian ->
|
||||
"ita"
|
||||
|
||||
|
||||
toName : Language -> String
|
||||
toName lang =
|
||||
@ -53,7 +59,10 @@ toName lang =
|
||||
French ->
|
||||
"French"
|
||||
|
||||
Italian ->
|
||||
"Italian"
|
||||
|
||||
|
||||
all : List Language
|
||||
all =
|
||||
[ German, English, French ]
|
||||
[ German, English, French, Italian ]
|
||||
|
Reference in New Issue
Block a user