🎉 Early access opening October 2025 • First 100 users get lifetime 50% off

From messy PDFs to reliable spreadsheets

Extract tables with provenance, validation, and APIs—so your data is right the first time.

SHRP.app: Basic PDF to Excel conversion available now.Kepler Docs: Advanced batch processing, mappings, and API coming soon.

500+ on waitlistLaunching Sept 2025By makers of SHRP.app

Turn PDFs Into Reliable Data Pipelines

From one-off extractions to automated workflows with provenance tracking and validation

1

Upload & Queue

Batch upload PDFs or ZIP files. Each file becomes a tracked job with real-time status updates and progress monitoring.

2

Extract & Validate

Apply mapping templates with built-in validations. Check running balances, detect duplicates, and verify data integrity automatically.

3

Export & Automate

Download clean XLSX/CSV/JSON with full provenance tracking, or integrate via API with webhooks for automated workflows.

What We're Building

Three foundational pillars that make PDF extraction reliable, controllable, and integrated

Accuracy you can trust

Multipage stitching, header/footer scrubbing, rotated/deskewed scans, multi-level headers, and type inference for dates, currency, and percentages.

⚙️

Control you can tune

Visual mappings, YAML/JSON data contracts, rule engine for if/then transforms, and human-in-the-loop review with change history.

🔗

Integrations that fit

Batch & API with webhooks, exports to XLSX/CSV/JSONL, and connectors for Sheets, BigQuery, Snowflake, S3, Drive, and Dropbox.

How Kepler Docs Compares

FeatureOpen Source
(Camelot/Tabula)
Enterprise OCR
(Google/Azure)
Kepler Docs
Accuracy⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Provenance Tracking
Built-in ValidationPartial
Batch ProcessingManual
PricingFree$$$$

Enterprise-Grade Data Processing

Advanced features that turn PDF extraction into reliable, repeatable data pipelines

🗃️

Mapping Templates

Reusable document templates with schema definitions, column mappings, type inference, and transformation rules. Save once, reuse on every batch from that vendor.

Advanced Validations

Running balance checks for statements, column sum validation for invoices, duplicate detection with configurable windows, and date sanity checks.

🔍

Full Provenance

Split-view interface showing PDF source alongside extracted data. Hover any cell to see exact source location, confidence scores, and parser notes.

Pre-Built Templates

Ready-made mapping templates with validations and transformations for common document types

📄

Invoices

Line items extraction, tax calculations, total validation, vendor mapping

Multi-line items
Tax validation
Total checks
🏦

Bank Statements

Transaction parsing, running balance validation, duplicate detection

Running balances
Duplicate detection
Date validation
📊

Financial Reports

P&L statements, balance sheets with multi-level headers and calculations

Multi-level headers
Financial formulas
Period validation
📦

Shipping Documents

Purchase orders, packing lists, shipping manifests with item tracking

Item tracking
Quantity validation
Address parsing

Advanced Processing Platform

Kepler Docs will add batch processing, visual mappings, API access, and advanced validations to the core conversion capabilities.

Batch Processing & Jobs

Upload multiple PDFs or ZIP files. Track processing in real-time with status updates and progress indicators.

Drag & drop with progress tracking
Real-time job status monitoring
Bulk export in multiple formats
invoice_batch_001.pdf✓ Completed
bank_statements.zip⏳ Processing
PDF Source
INVOICE #12345
Date: 2025-01-15
Amount: $1,250.00
← Source data
Extracted Data
Invoice #Amount
12345$1,250.00

Provenance & Validation

See exactly where every data point came from with our split-view interface. Built-in validations ensure data accuracy.

Source highlighting and traceability
Automatic data validation checks
Confidence scores and error detection

Built-in Validations

Automatic validation ensures your data is accurate and trustworthy before export.

Running balance validation for statements
Column sum checks for line items
Duplicate detection with configurable windows
Running Balance Check✓ Passed
All 45 rows validated
Duplicate Detection✓ Passed
No duplicates found
Date Sanity Check✗ Failed
2 future dates detected
API Quickstart
# Create an extraction job
curl -X POST https://api.keplerdocs.com/v1/jobs \
  -H 'Authorization: Bearer <API_KEY>' \
  -F file=@statement.pdf \
  -F mapping=bank_statement_v1

# Receive webhook with export URL
POST /webhooks/kepler {
  "job_id": "job_123",
  "status": "succeeded", 
  "export_url": "https://.../job_123.xlsx"
}

Developer-First API

RESTful API with webhooks, SDKs, and comprehensive documentation for seamless integration into your existing workflows.

  • Async processing with webhook callbacks
  • Python & JavaScript SDKs
  • Batch uploads and exports
  • Custom mappings and validations
  • Idempotency keys and signed URLs

Building in Public - Our Roadmap

Core Extraction Engine

95%+ accuracy on financial documents

Complete

API & Webhooks

RESTful API with async processing

In Progress

Zapier Integration

Connect with 5000+ apps

Q4 2025

On-Premise Option

Self-hosted enterprise deployment

Q1 2026

Enterprise-Grade Security & Reliability

Production-ready infrastructure with security controls and reliability guarantees

Data Controls

Configurable retention policies (delete on completion, 24h default, or custom). Choose US or EU processing regions. Full audit trails and signed secure links.

Enterprise Ready

SSO/SAML authentication, role-based access control (RBAC), comprehensive audit logs, and API key management with usage tracking.

Reliability & Scale

99.9% uptime SLA, automatic retries with dead letter queues, back-pressure handling, and accuracy testing on every release.

Ready to Transform Your PDF Processing?

Join thousands of professionals already on our waitlist. Be the first to access Kepler Docs when we launch and get exclusive early adopter benefits.

🎉 Early access, exclusive pricing, and priority support. No spam, ever.

Need PDF conversion now?Try SHRP.app for free →

Kepler Docs is named in homage to Johannes Kepler's approach of finding clean, reliable structure in messy observations. We're not affiliated with any Kepler organization—just inspired by the pursuit of turning chaotic data into usable, trustworthy information.