Commit Graph

897 Commits

Author SHA1 Message Date
b63cf74a2d Merge pull request #66 from scala-steward/update/sbt-sonatype-3.9.2
Update sbt-sonatype to 3.9.2
2020-03-27 22:55:20 +01:00
6a1297fc95 Add a limit for text analysis 2020-03-27 22:54:49 +01:00
e035b8e985 Update sbt-sonatype to 3.9.2 2020-03-27 22:29:05 +01:00
14a25fe23e Fix serializing mediatype parameters 2020-03-27 21:50:06 +01:00
aed5dfaff6 Fix mimetype extractors 2020-03-27 21:49:55 +01:00
74fb0d994f Add new options to nix module 2020-03-27 20:16:18 +01:00
75405dbcba Update documentation 2020-03-27 20:16:18 +01:00
f9d8016dc9 Merge pull request #65 from scala-steward/update/sbt-sonatype-3.9.1
Update sbt-sonatype to 3.9.1
2020-03-27 12:36:11 +01:00
2d0fcfbe2f Update sbt-sonatype to 3.9.1 2020-03-27 10:43:38 +01:00
16edf84752 Setup new site 2020-03-27 00:35:15 +01:00
9656ba62f4 scalafmtAll 2020-03-26 18:26:00 +01:00
09ea724c13 Store message-id of eml files 2020-03-25 22:00:51 +01:00
6b13993257 Merge pull request #64 from scala-steward/update/http4s-blaze-client-0.21.2
Update http4s-blaze-client, ... to 0.21.2
2020-03-25 08:00:36 +01:00
fe2b27bd49 Update http4s-blaze-client, ... to 0.21.2 2020-03-25 04:13:16 +01:00
43efb4e6ba Use doobie support from emil project 2020-03-24 23:40:29 +01:00
e305b46708 Extract tnef attachments and fix incomplete html
The wkhtmltopdf requires the content encoding set correctly in the
document.
2020-03-24 23:40:29 +01:00
0b80572664 Fix encodings for mails with non-utf8 html parts 2020-03-24 23:40:29 +01:00
012f86994a Merge pull request #63 from scala-steward/update/flyway-core-6.3.2
Update flyway-core to 6.3.2
2020-03-24 14:55:35 +01:00
519a39c991 Update flyway-core to 6.3.2 2020-03-24 14:17:49 +01:00
cf7ccd572c Improve handling encodings
Html and text files are not fixed to be UTF-8. The encoding is now
detected, which may not work for all files. Default/fallback will be
utf-8.

There is still a problem with mails that contain html parts not in
utf8 encoding. The mail text is always returned as a string and the
original encoding is lost. Then the html is stored using utf-8 bytes,
but wkhtmltopdf reads it using latin1. It seems that the `--encoding`
setting doesn't override encoding provided by the document.
2020-03-23 22:51:28 +01:00
b265421a46 Allow to use the browser's pdf viewer
The viewerjs library has some limitations. Sometimes PDFs are quite
blurry and some content is displayed scrambled. Switching to the
browsers build-in PDF viewer (for chromium and firefox) fixes this. So
while on mobile the viewerjs is the only working viewer, for desktop
use it might be desireable to use the browsers builtin viewer instead.
2020-03-22 22:03:43 +01:00
75ead33652 Provide a download link to the original archive file 2020-03-22 21:48:49 +01:00
8ff83e4f06 Update Changelog 2020-03-22 21:39:17 +01:00
7e6eec9533 Include archive infos in item detail 2020-03-22 21:35:50 +01:00
cbc95b11e6 Add routes to retrive the archive of an attachment 2020-03-22 21:21:49 +01:00
9a99c852a8 Fix typo in search menu 2020-03-22 21:08:01 +01:00
a165f87fea Update changelog 2020-03-20 23:01:50 +01:00
3703dce9a6 Update fs2 to 2.3.0 2020-03-20 22:47:09 +01:00
f5a48c544f Merge pull request #61 from scala-steward/update/sbt-microsites-1.1.5
Update sbt-microsites to 1.1.5
2020-03-20 22:39:27 +01:00
cba466ed47 Set item due date candidate
After processing, set the due date of an item to the first candidate.
The earliest due date is considered best match.
2020-03-20 22:39:09 +01:00
0903bdf6b3 Update sbt-microsites to 1.1.5 2020-03-20 12:45:58 +01:00
fd48dace9d Merge pull request #60 from scala-steward/update/sbt-mdoc-2.1.5
Update sbt-mdoc to 2.1.5
2020-03-20 10:40:14 +01:00
ab599543ab Merge pull request #59 from scala-steward/update/mariadb-java-client-2.6.0
Update mariadb-java-client to 2.6.0
2020-03-20 10:39:57 +01:00
67a96d95c8 Update sbt-mdoc to 2.1.5 2020-03-20 09:08:40 +01:00
6477745b77 Update mariadb-java-client to 2.6.0 2020-03-20 09:08:34 +01:00
7c4e4bb076 Merge pull request #57 from eikek/feature/archives
Feature/archives
2020-03-19 23:11:06 +01:00
74a6cf1dd1 Remove unused migration directory 2020-03-19 22:43:41 +01:00
b1a1a2b837 Add archives to collective insights 2020-03-19 22:43:18 +01:00
d78bd4142c Update documentation 2020-03-19 22:42:58 +01:00
439aaee27b Search archives when looking for files via checksum 2020-03-19 22:42:48 +01:00
6b1156182c Add support for eml (rfc822 email) files 2020-03-19 22:42:40 +01:00
4ed7a137f7 Add support for archive files
Each attachment is now first extracted into potentially multiple ones,
if it is recognized as an archive. This is the first step in
processing. The original archive file is also stored and the resulting
attachments are associated to their original archive.

First support is implemented for zip files.
2020-03-19 22:42:27 +01:00
2a7066650f Merge pull request #56 from eikek/feature/analysis
Feature/analysis
2020-03-18 00:00:47 +01:00
10f3d5b7ed Fix bug to select other attachments 2020-03-17 22:37:43 +01:00
f0449dd2ce Properly initialize thread pools 2020-03-17 22:37:12 +01:00
00ca6b5697 Improve text analysis
- Search for consecutive labels

- Sort list of candidates by a weight

- Search for organizations using person labels
2020-03-17 22:34:50 +01:00
2da0f91052 Merge pull request #55 from scala-steward/update/tika-core-1.24
Update tika-core to 1.24
2020-03-17 18:03:55 +01:00
736a28bc4f Update tika-core to 1.24 2020-03-17 16:43:17 +01:00
a4c97d5d57 Merge pull request #54 from scala-steward/update/sbt-mdoc-2.1.4
Update sbt-mdoc to 2.1.4
2020-03-16 16:43:54 +01:00
1125cc872c Update sbt-mdoc to 2.1.4 2020-03-16 16:21:51 +01:00