docspell/modules/microsite/docs/features.md
Eike Kettner 3d49ceaab5 Use ocrmypdf tool to create pdf/a during conversion
- Use another external tool to convert pdf to pdf which also adds the
  extracted text as another layer into the pdf

- Although not used, the external conversion routine will now check
  for an existing text file that is named as the pdf file with extension
  `.txt`. If present it is included in the conversion result and will be
  used as the extracted text.

- text extraction for pdf files happens now on the converted file,
  because it may already contain the text from the conversion step and
  thus avoids running OCR twice.

- All errors during conversion are not fatal; processing continues
  without a converted file.
2020-07-18 17:19:29 +02:00

2.5 KiB

layout title permalink
docs Features and Limitations features

Features

  • Multi-account application
  • Multiple users per account (multiple users can access the same account)
  • Handle multiple documents as one unit
  • OCR using tesseract
  • Full-Text Search based on Apache SOLR
  • Conversion to PDF: all files are converted into a PDF file. PDFs with only images (as often returned from scanners) are converted into searchable PDF/A pdfs.
  • Non-destructive: all your uploaded files are never modified and can always be downloaded untouched
  • Text is analysed to find and attach meta data automatically
  • Manage document processing: cancel jobs, set priorities
  • Everything available via a documented REST Api; allows to generate clients for (almost) any language
  • mobile-friendly Web-UI
  • Create “share-urls” to upload files anonymously
  • Send documents via e-mail
  • E-Mail notification for documents with due dates
  • Read your mailboxes via IMAP to import mails into docspell
  • REST server and document processing are separate applications which can be scaled-out independently
  • Everything stored in a SQL database: PostgreSQL, MariaDB or H2
    • H2 is embedded, a "one-file-only" database, avoids installing db servers
  • Files supported:
    • Documents:
      • PDF
      • common MS Office (doc, docx, xls, xlsx)
      • OpenDocument (odt, ods)
      • RichText (rtf)
      • Images (jpg, png, tiff)
      • HTML
      • text/* (treated as Markdown)
    • Archives (extracted automatically, can be nested)
      • zip
      • eml (e-mail files in plain text MIME)
  • Tooling:
  • License: GPLv3

Limitations

These are current known limitations that may be of interest for considering docspell at the moment.

  • Documents cannot be modified.
  • You can remove and add documents but there is no versioning.