Every document, findable in seconds
Most businesses manage documents the same way they always have: shared drives with folder structures that made sense when they were created, email attachments that never get saved anywhere, physical paper that gets scanned into a folder called "Scans 2024" and never touched again. Paperless-ngx is the open source document management system that brings order to that chaos: automatic OCR on every document, full-text search across your entire archive, intelligent tagging and classification, and a web interface that makes finding any document a seconds-long operation rather than a minutes-long hunt. Node deploys and manages Paperless-ngx on infrastructure you control. Your documents stay where they belong: with you.
What Paperless-ngx is
Paperless-ngx is the community-maintained fork of the original Paperless project, a self-hosted document management system designed specifically for organisations that want to digitise their document archive and make it searchable without sending documents to a third-party SaaS platform.
The platform takes documents from any source (scanners, email inboxes, file uploads, watched folders), runs OCR to extract their text content, and stores them in a searchable archive with automatic metadata extraction, manual and automatic tagging, correspondent tracking, and a powerful search engine. Every document your business handles is findable by its content, not just its filename.
Paperless-ngx is widely used by organisations with significant document volumes: legal and professional services firms, healthcare organisations, financial services businesses, local authorities, and any organisation that handles large volumes of contracts, correspondence, invoices, regulatory filings or compliance documentation.
Document ingestion from any source
Paperless-ngx is designed to accept documents from wherever they currently live, without requiring a change to how people work.
Email consumption: configure Paperless-ngx to monitor one or more email inboxes and automatically ingest attachments. Anything arriving by email, including invoices from suppliers, correspondence from clients and regulatory notices from authorities, is captured, OCR'd and indexed automatically. You define rules that assign metadata based on sender, subject or content.
Watched folders: configure network folders that Paperless-ngx monitors continuously. Any file placed in a watched folder (by a scan from a network-connected scanner, by a file transfer from another system, or by a user saving a document) is automatically consumed, processed and indexed. The folder structure that people already use becomes an ingestion mechanism.
Direct upload: the web interface accepts documents by drag-and-drop. Users who find a document on their desktop can push it into Paperless-ngx in seconds without any special process.
Scanner integration: network scanners that support scan-to-folder or scan-to-email work natively with Paperless-ngx's ingestion mechanisms. Physical documents scanned at any networked scanner appear in the archive within minutes.
Mobile capture: the Paperless-ngx web interface is fully responsive. Staff in the field can photograph documents on a mobile device, upload through the browser, and have the document in the archive and searchable immediately.
OCR and text extraction
The difference between a searchable document archive and a folder full of PDFs is OCR: optical character recognition that extracts the text content of scanned images and makes it searchable.
Automatic OCR: every document ingested by Paperless-ngx is automatically processed by Tesseract OCR, one of the most accurate open source OCR engines available. The extracted text is indexed and becomes fully searchable. A 1,000-page scanned contract archive becomes as searchable as a database.
Multi-language OCR: Tesseract supports over 100 languages. Organisations with international operations and multi-language document archives get accurate OCR across all their document languages.
PDF text layer: for PDFs that already contain a text layer (digitally created PDFs rather than scans), Paperless-ngx extracts the text directly without running OCR, maintaining higher accuracy while processing faster.
Searchable PDFs: when Paperless-ngx processes a scanned document, it creates a searchable PDF that contains both the original scanned image and an invisible text layer. The document is visually identical to the original and is also fully text-searchable when downloaded.
Search and retrieval
Finding documents in Paperless-ngx is genuinely fast because the entire text content of every document is indexed.
Full-text search: search across the complete text content of your entire document archive. Type any phrase, reference number, name, address, or any other text that appears anywhere in a document and Paperless-ngx finds it. A three-year-old supplier invoice is as findable as a document created yesterday.
Advanced search syntax: combine search terms with boolean operators, restrict to date ranges, filter by correspondent, type, tag or custom field. Find all contracts from a specific supplier signed in a particular year. Find every document containing a reference number. Find all invoices above a certain value. The search engine handles complex queries as easily as simple ones.
Saved views: create named saved views for search queries you run regularly. Your accounts team gets a view showing all outstanding invoices. Your compliance team gets a view showing all regulatory filings from the current period. Each user gets the subset of the archive most relevant to their work.
Automatic classification and tagging
Paperless-ngx includes a machine learning classifier that learns from the tags and metadata you apply and starts applying them automatically.
Correspondent detection: the classifier learns to identify which correspondent a document is from based on text content patterns. A document from a supplier you deal with regularly is automatically tagged with the correct correspondent without any manual intervention.
Document type classification: the classifier learns to distinguish invoice from contract from correspondence from regulatory filing. Documents are automatically assigned the correct type based on learned patterns, giving your team pre-sorted document queues rather than an undifferentiated inbox.
Tag automation: define rules that apply tags based on content patterns, correspondent, document type or date. All documents from a specific sender get a particular tag. All documents containing a specific phrase get tagged for compliance review. Automation rules mean your archive stays organised without manual effort.
Custom fields: extend Paperless-ngx's metadata model with custom fields specific to your business. Any metadata your business needs to track, such as invoice number, contract value, expiry date, matter reference or project code, is a custom field that can be searched and filtered.
Security and access control
Documents are business assets and, in many cases, legally sensitive. Paperless-ngx provides proper access controls so the right people see the right documents.
User groups and permissions: configure user groups with access to specific document sets. The finance team sees invoices and financial documents. The legal team sees contracts and correspondence. HR documents are accessible only to HR and management. Access control is applied at the document level, not just the folder level.
Document encryption: our managed deployments store documents on encrypted volumes with encrypted backups, an infrastructure-level control Node provides around Paperless-ngx. Access requires authenticated login, and documents are not accessible from the network without valid credentials.
Audit logging: document access, downloads and modifications are logged. Your compliance team has a full audit trail of who accessed which documents and when, meeting the requirements of data subject access requests and regulatory audits.
GDPR retention policies: configure automated deletion of documents after defined retention periods. Personal data that should not be held beyond a specific period is automatically purged. Right-to-erasure requests can be handled by searching for the data subject's name and removing associated documents.
Integration with Nextcloud and the Node platform
For organisations also running Nextcloud Private Cloud for file storage, Paperless-ngx and Nextcloud are complementary systems: Paperless-ngx handles the document management workflow (ingestion, OCR, classification, archiving) while Nextcloud handles collaborative file storage and sharing.
Documents processed by Paperless-ngx can be stored in a Nextcloud-connected storage backend, making them accessible through both the Paperless-ngx search interface and the Nextcloud file browser. Teams that prefer the familiarity of a file explorer interface retain it while gaining the full-text search capability of Paperless-ngx.
Keycloak single sign-on
Paperless-ngx integrates with Keycloak via OpenID Connect, allowing staff to access the document management system with their existing corporate credentials. No separate passwords, no separate user directory. MFA requirements enforced at the identity layer apply consistently to document access.
The compliance case for document management: regulated organisations face specific obligations around document retention, access control and auditability that shared drives cannot satisfy. A law firm must demonstrate who accessed a client file and when. A healthcare organisation must enforce retention schedules and respond to right-of-access requests. A financial services business must maintain audit trails for regulatory review. Paperless-ngx, deployed by Node on infrastructure you control, provides the technical controls for all of these requirements. Your document archive is not in a vendor's cloud. It is on infrastructure you own, with access logs you control, under retention policies you define.
Talk to us about document management.
Drop us a line and our team will discuss your document volumes, ingestion sources, compliance requirements and how a managed Paperless-ngx deployment can replace your current approach.
Frequently asked questions
Who provides managed Paperless-ngx hosting in the UK?
Node Digital deploys and manages Paperless-ngx on UK infrastructure or in your own cloud, with encrypted storage, automated backups, upgrades, monitoring and Keycloak single sign-on included in the managed service.
Is Paperless-ngx suitable for business and regulated use?
Yes, when deployed properly. Paperless-ngx provides document-level permissions, audit logging and retention policies, and our managed deployment adds encrypted volumes, encrypted backups and UK data residency to meet the expectations of legal, financial and healthcare organisations.
How does Paperless-ngx make scanned documents searchable?
Every ingested document is processed with Tesseract OCR, which extracts the text and indexes it for full-text search. Paperless-ngx also produces searchable PDFs that contain the original scan plus an invisible text layer.
Can Paperless-ngx ingest documents automatically from email and scanners?
Yes. Paperless-ngx monitors email inboxes and watched network folders, so supplier invoices arriving by email and documents from network scanners are captured, OCR processed and classified without manual filing.
Does Paperless-ngx help with GDPR retention and subject access requests?
Yes. You can define retention periods that automatically delete documents when they expire, and full-text search makes it practical to locate every document relating to a data subject when responding to access or erasure requests.