mirror of https://github.com/TheAnachronism/docspell.git synced 2025-10-06 19:27:12 +00:00

Go to file

Eike Kettner 3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion

- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.

2020-07-18 17:19:29 +02:00

artwork

Upgrade microsite

2019-12-30 02:33:46 +01:00

docker

Use ocrmypdf tool to create pdf/a during conversion

2020-07-18 17:19:29 +02:00

modules

Use ocrmypdf tool to create pdf/a during conversion

2020-07-18 17:19:29 +02:00

nix

Use ocrmypdf tool to create pdf/a during conversion

2020-07-18 17:19:29 +02:00

project

Update sbt-scalafix to 0.9.19

2020-07-14 16:21:29 +02:00

tools

Fixes for consumedir.sh

2020-06-28 13:37:39 +02:00

.gitignore

Add support for integrating into nix/nixos

2020-01-20 00:21:15 +01:00

.mergify.yml

Extend mergify to merge my own prs on ci success

2020-06-13 14:53:38 +02:00

.scala-steward.conf

Don't update stanford-nlp just yet

2020-04-22 21:06:42 +02:00

.scalafix.conf

Add scalafix and organize-imports rule

2020-06-28 21:20:47 +02:00

.scalafmt.conf

Update scalafmt-core to 2.6.3

2020-07-10 22:34:54 +02:00

.travis.yml

Update scala to 2.13.2

2020-04-24 22:24:31 +02:00

build.sbt

Add sbt alias for reformatting

2020-07-14 23:18:39 +02:00

Changelog.md

Update documentation and fix changelog wording

2020-06-29 20:37:52 +02:00

elm-analyse.json

Fix elm-analyse issues

2020-01-29 20:56:14 +01:00

elm-package.json

Add scalafmt.conf and elm compile options

2019-12-29 20:52:43 +01:00

elm.json

Throttle search requests

2020-06-13 21:17:15 +02:00

LICENSE.txt

Initial version.

2019-09-21 22:02:36 +02:00

NOTICE.txt

Improve handling encodings

2020-03-23 22:51:28 +01:00

README.md

Fix typo in README

2020-05-23 09:26:07 +02:00

version.sbt

Set version to 0.9.0-SNAPSHOT

2020-06-29 21:04:15 +02:00

README.md

Docspell

Docspell is a personal document organizer. You'll need a scanner to convert your papers into PDF files. Docspell can then assist in organizing the resulting mess 😉.

You can associate tags, set correspondends, what a document is concerned with, a name, a date and some more. If your documents are associated with this meta data, you should be able to quickly find them later using the search feature. But adding this manually to each document is a tedious task. What if most of it could be done automatically?

It is provided as a REST server and a web application.

How it works

Documents have two main properties: a correspondent (sender or receiver that is not you) and something the document is about. Usually it is about a person or some thing – maybe your car, or contracts concerning some familiy member, etc.

You maintain a kind of address book. It should list all possible correspondents and the concerning people/things. This grows incrementally with each new unknown document.
When docspell analyzes a document, it tries to find matches within your address book. It can detect the correspondent and a concerning person or thing. It will then associate this data to your documents.
You can inspect what docspell has done and correct it. If docspell has found multiple suggestions, they will be shown for you to select one. If it is not correctly associated, very often the correct one is just one click away.

The set of meta data, that docspell uses to draw suggestions from, must be maintained manually. But usually, this data doesn't grow as fast as the documents. After a while there is a quite complete address book and only once in a while it has to be revisited.

Install

Install the provided deb file at your debian based system.
Download provided zip file and run the script in bin/, as described here.
Using the nix package manager as described here. A NixOS module is available, too.
Using Docker, as described here.

Documentation

The documentation site provides more information.

Check the feature list and the quickstart guide to try it out:

Screenshots

Here are some (outdated) screenshots, for getting a first impression of the web ui.

Languages

Elm 47.2%

Scala 44.2%

Nix 2.3%

Java 2.2%

Rich Text Format 2%

Other 2%

README.md Unescape Escape

Docspell

How it works

Install

Documentation

Screenshots

README.md