How to Use OkraPDF with ChatGPT & Claude
Talk to your documents the way DeepWiki lets you talk to code.
Two Ways to Work with Your Documents
| Method | Best For | Setup Time | |--------|----------|------------| | MCP Connection (Claude Desktop) | Natural conversation with your doc library | 2 minutes | | Preprocessing + Upload (ChatGPT/Claude) | One-off analysis, maximum control | 5 minutes |
Method 1: Direct Connection via MCP (Recommended)
Just like DeepWiki lets you ask questions about any GitHub repo, OkraPDF lets you ask questions about any PDF you've uploaded—directly from Claude Desktop.
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ Claude Desktop │
│ │
│ You: "What was Q3 revenue across my quarterly reports?" │
│ │
│ Claude: → [MCP] list_documents() │
│ → [MCP] ask_question("Q3-Report.pdf", "total revenue") │
│ → "Based on your Q3 report, revenue was $4.2B..." │
└─────────────────────────────────────────────────────────────────┘
│
▼ OAuth (one-time login)
┌─────────────────────────────────────────────────────────────────┐
│ okrapdf.com/mcp │
│ │
│ Your uploaded PDFs → Pre-extracted tables & text → Fast answers│
└─────────────────────────────────────────────────────────────────┘
Setup (2 minutes)
- Open Claude Desktop settings
- Add this MCP server:
{
"mcpServers": {
"okrapdf": {
"url": "https://okrapdf.com/mcp",
"oauth": {
"client_id": "auto",
"scopes": ["documents:read"]
}
}
}
}
- Claude will prompt you to log in (Google/GitHub SSO)
- Start asking questions about your documents
Example Conversation
You: "What are the revenue figures across my uploaded quarterly reports?"
Claude: I'll search your documents and extract the revenue data.
→ [Calling list_documents...]
→ Found: Q1-Report.pdf, Q2-Report.pdf, Q3-Report.pdf
→ [Calling ask_question on each...]
Based on your quarterly reports:
• Q1: $3.2B
• Q2: $3.8B
• Q3: $4.2B
Revenue grew 31% across the three quarters.
Why This Works
Unlike uploading files every time, MCP uses pre-extracted context:
- Tables already parsed into structured data
- OCR already run on scanned documents
- Answers come back in seconds, not minutes
Method 2: Preprocessing + Upload
For ChatGPT users, or when you need maximum control over what the AI sees.
Why Preprocess PDFs?
When you upload a PDF directly to ChatGPT, it "reads" it like a human—which means it can skip lines or misread tables. When you preprocess with OkraPDF, ChatGPT receives clean Markdown files with exact data structures and near-perfect accuracy.
What Gets Lost Without Preprocessing
| Document Element | What ChatGPT Sees | The Problem | |------------------|-------------------|-------------| | Multi-row tables | Fragmented text chunks | Rows split across chunks, losing context | | Nested tables | Jumbled text | Structure completely destroyed | | Footnotes & references | Disconnected text | Numbers without their explanations | | Scanned documents | Nothing or garbage | Basic OCR fails on complex layouts |
The Workflow
┌─────────────────┐
│ Your PDF │
└────────┬────────┘
│
▼
┌──────────────────────────────────────┐
│ OkraPDF │
│ Visual AI → Structure Detection │
│ → Page-by-Page Markdown Files │
└──────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Download ZIP │ ← Individual .md files per page
│ (pages.zip) │
└────────┬────────┘
│
┌────┴────┐
▼ ▼
┌────────┐ ┌────────┐
│ChatGPT │ │ Claude │
│Projects│ │ │
└────────┘ └────────┘
Step 1: Process Your PDF in OkraPDF
- Go to okrapdf.com
- Upload your PDF (drag & drop or paste URL)
- Wait for processing to complete (green status bar)
- Review extraction in the split-view
What OkraPDF Preserves
- Table structure with row/column relationships
- Headers and footers in context
- Multi-page table continuations
- Footnote references
Step 2: Download the ZIP File
Once processing completes, click the Export button and select Download ZIP.
You'll get a pages.zip file containing:
pages.zip
├── page_001.md
├── page_002.md
├── page_003.md
├── ...
└── manifest.json
Each page is a clean Markdown file with tables properly formatted:
| Segment | 2024 | 2023 |
|---------|------|------|
| Gaming | 12.5B | 10.2B |
| Data Center | 47.5B | 15.0B |
Step 3: Use with ChatGPT
Choose the method that fits your workflow:
Option A: ChatGPT Projects (Best for Recurring Work)
Use this if you need to ask questions about these reports regularly (e.g., "Compare Q1 and Q4"). Creates a permanent library for your files.
Requirements: ChatGPT Plus, Team, or Enterprise
Steps:
- Open ChatGPT Desktop App sidebar
- Click Projects → New Project
- Name it (e.g., "Financial Reports 2024")
- In Project settings, find Knowledge or Files
- Click Add source → Upload files
- Upload your
pages.zipfile directly (ChatGPT reads inside zips)
Example prompts:
"Search the Q3 files for the Adjusted EBITDA table and give me the rows where the value exceeds $5M."
"Compare revenue trends between page_010.md and page_045.md"
Option B: Zip & Script (Best for Precision)
Use this when you need 100% accuracy (e.g., "Find every single mention of 'churn' across 500 pages"). Uses Python to physically search files.
Requirements: ChatGPT with Advanced Data Analysis (Plus/Team/Enterprise)
Steps:
- Open a new ChatGPT chat
- Drag the
pages.zipfile into the message bar - Use this exact prompt:
I have uploaded a zip file containing page-by-page markdown reports.
I need you to perform a precise search using Python.
- Unzip the file into your environment
- Iterate through every .md file in the directory
- Search specifically for the term '[YOUR KEYWORD HERE]'
- For every match found, print a list containing:
- The Filename (e.g., page_042.md)
- The exact line of text where the match appears
Do not summarize. Just list the matches.
- ChatGPT will run Python code to grep through your files
- Click "Analyzing..." to verify the actual code it ran
Which Method Should I Use?
| Method | Best For | Example Question | AI | |--------|----------|------------------|-----| | MCP Connection | Ongoing document library | "Search my reports for revenue trends" | Claude | | ChatGPT Projects | Big picture questions | "What's the revenue trend?" | ChatGPT | | Zip & Script | Finding specific data | "List every invoice number" | ChatGPT |
Decision Tree
Do you use Claude Desktop?
├─ Yes → Use MCP Connection (fastest, no re-uploads)
└─ No → Do you need recurring access?
├─ Yes → ChatGPT Projects
└─ No → Zip & Script
Why OkraPDF Makes AI Better
The Accuracy Difference
| Approach | Table Accuracy | Speed | |----------|----------------|-------| | Direct PDF Upload | ~40-60% | Slow (re-parses every time) | | OkraPDF Preprocessing | ~95%+ | Fast (pre-extracted) | | OkraPDF MCP | ~95%+ | Fastest (cached + streaming) |
The Key Difference
ChatGPT's text extraction doesn't understand tables. It might see:
Revenue | 2024 | 2023
Gaming | 12.5B |
And chunk right there, losing the 2023 value.
OkraPDF uses visual AI that sees document layout like a human, then extracts complete, self-contained tables ready for AI analysis.
Try It Now
For Claude Desktop Users (Recommended)
- Add OkraPDF to your MCP servers (config above)
- Upload PDFs at okrapdf.com
- Ask Claude: "What documents do I have in OkraPDF?"
For ChatGPT Users
- Upload a complex PDF at okrapdf.com
- Download the ZIP when processing completes
- Upload to ChatGPT Projects or use the Zip & Script method
- Compare answers with and without preprocessing
Common Questions
"Can't ChatGPT just read PDFs now?"
Yes, but it uses basic text extraction. Complex layouts—especially tables with merged cells or multi-page spans—still get mangled. OkraPDF uses visual AI that "sees" the document like a human.
"How many pages can I process?"
- Free tier: 5 pages
- Standard plan: 2,000 pages/month
"Is my data secure?"
Yes. AES-256 encryption at rest, TLS 1.3 in transit, auto-deleted after 30 days. See our Security page.
Summary
| Without OkraPDF | With OkraPDF | |-----------------|--------------| | Tables break across chunks | Tables stay intact | | Numbers may hallucinate | Numbers verified visually | | ~60% accuracy on tables | ~95%+ accuracy on tables | | Re-upload files every time | MCP: instant access to your library |
OkraPDF isn't a replacement for ChatGPT or Claude. It's the bridge that makes AI dramatically better at understanding your documents.
Think of it like DeepWiki for documents:
- DeepWiki: "Explain how authentication works in this repo"
- OkraPDF: "What's the revenue breakdown in my Q3 report?"
Questions? Email support@okrapdf.com