mirror of
https://github.com/TheAnachronism/docspell.git
synced 2024-11-13 02:31:10 +00:00
3d49ceaab5
- Use another external tool to convert pdf to pdf which also adds the extracted text as another layer into the pdf - Although not used, the external conversion routine will now check for an existing text file that is named as the pdf file with extension `.txt`. If present it is included in the conversion result and will be used as the extracted text. - text extraction for pdf files happens now on the converted file, because it may already contain the text from the conversion step and thus avoids running OCR twice. - All errors during conversion are not fatal; processing continues without a converted file.
2.5 KiB
2.5 KiB
layout | title | permalink |
---|---|---|
docs | Features and Limitations | features |
Features
- Multi-account application
- Multiple users per account (multiple users can access the same account)
- Handle multiple documents as one unit
- OCR using tesseract
- Full-Text Search based on Apache SOLR
- Conversion to PDF: all files are converted into a PDF file. PDFs with only images (as often returned from scanners) are converted into searchable PDF/A pdfs.
- Non-destructive: all your uploaded files are never modified and can always be downloaded untouched
- Text is analysed to find and attach meta data automatically
- Manage document processing: cancel jobs, set priorities
- Everything available via a documented REST Api; allows to generate clients for (almost) any language
- mobile-friendly Web-UI
- Create “share-urls” to upload files anonymously
- Send documents via e-mail
- E-Mail notification for documents with due dates
- Read your mailboxes via IMAP to import mails into docspell
- REST server and document processing are separate applications which can be scaled-out independently
- Everything stored in a SQL database: PostgreSQL, MariaDB or H2
- H2 is embedded, a "one-file-only" database, avoids installing db servers
- Files supported:
- Documents:
- common MS Office (doc, docx, xls, xlsx)
- OpenDocument (odt, ods)
- RichText (rtf)
- Images (jpg, png, tiff)
- HTML
- text/* (treated as Markdown)
- Archives (extracted automatically, can be nested)
- zip
- eml (e-mail files in plain text MIME)
- Documents:
- Tooling:
- Watch a folder: watch folders for changes and send files to docspell
- Simple CLI for uploading files
- Firefox plugin: right click on a link and send the file to docspell
- SMTP Gateway: Setup a SMTP server that delivers mails directly to docspell.
- License: GPLv3
Limitations
These are current known limitations that may be of interest for considering docspell at the moment.
- Documents cannot be modified.
- You can remove and add documents but there is no versioning.