🎉 Early access opening October 2025 • First 100 users get lifetime 50% off
From messy PDFs to reliable spreadsheets
Extract tables with provenance, validation, and APIs—so your data is right the first time.
SHRP.app: Basic PDF to Excel conversion available now.Kepler Docs: Advanced batch processing, mappings, and API coming soon.
Turn PDFs Into Reliable Data Pipelines
From one-off extractions to automated workflows with provenance tracking and validation
Upload & Queue
Batch upload PDFs or ZIP files. Each file becomes a tracked job with real-time status updates and progress monitoring.
Extract & Validate
Apply mapping templates with built-in validations. Check running balances, detect duplicates, and verify data integrity automatically.
Export & Automate
Download clean XLSX/CSV/JSON with full provenance tracking, or integrate via API with webhooks for automated workflows.
What We're Building
Three foundational pillars that make PDF extraction reliable, controllable, and integrated
Accuracy you can trust
Multipage stitching, header/footer scrubbing, rotated/deskewed scans, multi-level headers, and type inference for dates, currency, and percentages.
Control you can tune
Visual mappings, YAML/JSON data contracts, rule engine for if/then transforms, and human-in-the-loop review with change history.
Integrations that fit
Batch & API with webhooks, exports to XLSX/CSV/JSONL, and connectors for Sheets, BigQuery, Snowflake, S3, Drive, and Dropbox.
How Kepler Docs Compares
Feature | Open Source (Camelot/Tabula) | Enterprise OCR (Google/Azure) | Kepler Docs |
---|---|---|---|
Accuracy | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Provenance Tracking | ✗ | ✗ | ✓ |
Built-in Validation | ✗ | Partial | ✓ |
Batch Processing | Manual | ✓ | ✓ |
Pricing | Free | $$$ | $ |
Enterprise-Grade Data Processing
Advanced features that turn PDF extraction into reliable, repeatable data pipelines
Mapping Templates
Reusable document templates with schema definitions, column mappings, type inference, and transformation rules. Save once, reuse on every batch from that vendor.
Advanced Validations
Running balance checks for statements, column sum validation for invoices, duplicate detection with configurable windows, and date sanity checks.
Full Provenance
Split-view interface showing PDF source alongside extracted data. Hover any cell to see exact source location, confidence scores, and parser notes.
Pre-Built Templates
Ready-made mapping templates with validations and transformations for common document types
Invoices
Line items extraction, tax calculations, total validation, vendor mapping
Bank Statements
Transaction parsing, running balance validation, duplicate detection
Financial Reports
P&L statements, balance sheets with multi-level headers and calculations
Shipping Documents
Purchase orders, packing lists, shipping manifests with item tracking
Advanced Processing Platform
Kepler Docs will add batch processing, visual mappings, API access, and advanced validations to the core conversion capabilities.
Batch Processing & Jobs
Upload multiple PDFs or ZIP files. Track processing in real-time with status updates and progress indicators.
Invoice # | Amount |
---|---|
12345 | $1,250.00 |
Provenance & Validation
See exactly where every data point came from with our split-view interface. Built-in validations ensure data accuracy.
Built-in Validations
Automatic validation ensures your data is accurate and trustworthy before export.
# Create an extraction job
curl -X POST https://api.keplerdocs.com/v1/jobs \
-H 'Authorization: Bearer <API_KEY>' \
-F file=@statement.pdf \
-F mapping=bank_statement_v1
# Receive webhook with export URL
POST /webhooks/kepler {
"job_id": "job_123",
"status": "succeeded",
"export_url": "https://.../job_123.xlsx"
}
Developer-First API
RESTful API with webhooks, SDKs, and comprehensive documentation for seamless integration into your existing workflows.
- ✓Async processing with webhook callbacks
- ✓Python & JavaScript SDKs
- ✓Batch uploads and exports
- ✓Custom mappings and validations
- ✓Idempotency keys and signed URLs
Building in Public - Our Roadmap
Core Extraction Engine
95%+ accuracy on financial documents
API & Webhooks
RESTful API with async processing
Zapier Integration
Connect with 5000+ apps
On-Premise Option
Self-hosted enterprise deployment
Enterprise-Grade Security & Reliability
Production-ready infrastructure with security controls and reliability guarantees
Data Controls
Configurable retention policies (delete on completion, 24h default, or custom). Choose US or EU processing regions. Full audit trails and signed secure links.
Enterprise Ready
SSO/SAML authentication, role-based access control (RBAC), comprehensive audit logs, and API key management with usage tracking.
Reliability & Scale
99.9% uptime SLA, automatic retries with dead letter queues, back-pressure handling, and accuracy testing on every release.
Ready to Transform Your PDF Processing?
Join thousands of professionals already on our waitlist. Be the first to access Kepler Docs when we launch and get exclusive early adopter benefits.
🎉 Early access, exclusive pricing, and priority support. No spam, ever.
Kepler Docs is named in homage to Johannes Kepler's approach of finding clean, reliable structure in messy observations. We're not affiliated with any Kepler organization—just inspired by the pursuit of turning chaotic data into usable, trustworthy information.