OCR & Document Extraction using vision models. Contribute to getomni-ai/zerox development by creating an account on GitHub.
A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense! Zerox is available as both a Node and Python package. (Node.js SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, Google Gemini, etc.) The maintainFormat option tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. This requires the requests to run synchronously, so it's a lot slower. But valuable if your documents have a lot of tabular data, or frequently have tables that cross pages. Zerox supports structured data extraction from documents using a schema. This allows you to pull specific information from documents in a structured format instead of getting the full markdown conversion. Use extractPerPage to extract data per page instead of from the whole document at once. Zerox supports a wide range of models across different providers: (Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, etc.) The pyzerox.zerox function is an asynchronous API that performs OCR (Optical Character Recognition) to markdown using vision models. It processes PDF files and converts them into markdown format. Make sure to set up the environment variables for the model and the model provider before using this API. Refer to the LiteLLM Documentation for setting up the environment and passing the correct model name. Note the output is manually wrapped for this documentation for better readability. This project is licensed under the MIT License. OCR Document Extraction using vision models There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Features
Industry
information technology & services
Employees
6,000
Funding Stage
Other
Total Funding
$7.9B
20
npm packages
Web & game developers, this is your jam. Gamedev.js Jam 2026 is back. 🎮 🗓 April 13-26, 2026 🌐 Build an HTML5 game in 13 days 🏆 Prizes + expert feedback 💬 Active community + Discord Theme reve
Web & game developers, this is your jam. Gamedev.js Jam 2026 is back. 🎮 🗓 April 13-26, 2026 🌐 Build an HTML5 game in 13 days 🏆 Prizes + expert feedback 💬 Active community + Discord Theme revealed on day one. Ship something weird. Ship something fun. Just ship it. https://t.co/KPphbM8rTz
View originalPricing found: $50.10, $48.71, $48.71, $48.71, $9.74
Yes, GitHub is 18 years old today. But some things never change. https://t.co/CeDtE5ItYv
Yes, GitHub is 18 years old today. But some things never change. https://t.co/CeDtE5ItYv
View originalThe GitHub Actions 2026 security roadmap covers three layers in a shift toward making secure behavior the default. Here’s what’s coming next, and when. ⬇️ https://t.co/kF69g47Z09
The GitHub Actions 2026 security roadmap covers three layers in a shift toward making secure behavior the default. Here’s what’s coming next, and when. ⬇️ https://t.co/kF69g47Z09
View originalEvery dev knows security debt piles up fast ... and every repo has a few hidden vulnerabilities. 😅 With GitHub Copilot CLI, you can automate your security triage right from the terminal: 🔍 Run a fu
Every dev knows security debt piles up fast ... and every repo has a few hidden vulnerabilities. 😅 With GitHub Copilot CLI, you can automate your security triage right from the terminal: 🔍 Run a full security scan 📌 Map findings to the OWASP Top 10 🗂️ Automatically bulk-open GitHub Issues Get started with a new and improved workflow. 👇 https://t.co/m5eGC6Ddrh
View originalThinking about speaking at a tech conference? 💭 We’d love to hear your story on the stage at #GitHubUniverse this year. Submit your session idea now: https://t.co/rmH7FiR2WZ https://t.co/e9lFnB0B2U
Thinking about speaking at a tech conference? 💭 We’d love to hear your story on the stage at #GitHubUniverse this year. Submit your session idea now: https://t.co/rmH7FiR2WZ https://t.co/e9lFnB0B2U
View originalGit the full story here in last year's 20th anniversary Q&A with Linus Torvalds. 📖 https://t.co/qm3ybAT4kI
Git the full story here in last year's 20th anniversary Q&A with Linus Torvalds. 📖 https://t.co/qm3ybAT4kI
View originalLinus Torvalds wrote Git in just 10 days after Linux kernel developers lost access to their proprietary tool, BitKeeper, due to licensing disagreements. He went from solving a problem to revolutioniz
Linus Torvalds wrote Git in just 10 days after Linux kernel developers lost access to their proprietary tool, BitKeeper, due to licensing disagreements. He went from solving a problem to revolutionizing how software teams collaborate and develop.
View originalPicture it: The year is 2005 and you decide to download a new distributed system called Git (on your Windows XP? MacOS X Tiger? Linux kernel 2.6.11?). Now it's 21 years later and you're hosting your
Picture it: The year is 2005 and you decide to download a new distributed system called Git (on your Windows XP? MacOS X Tiger? Linux kernel 2.6.11?). Now it's 21 years later and you're hosting your code on GitHub. You can thank this man. 👇🧵
View originalWeb & game developers, this is your jam. Gamedev.js Jam 2026 is back. 🎮 🗓 April 13-26, 2026 🌐 Build an HTML5 game in 13 days 🏆 Prizes + expert feedback 💬 Active community + Discord Theme reve
Web & game developers, this is your jam. Gamedev.js Jam 2026 is back. 🎮 🗓 April 13-26, 2026 🌐 Build an HTML5 game in 13 days 🏆 Prizes + expert feedback 💬 Active community + Discord Theme revealed on day one. Ship something weird. Ship something fun. Just ship it. https://t.co/KPphbM8rTz
View originalGitHub Copilot cloud agent just got a lot more flexible ✨ You can now it use it to research, plan, and make code changes without needing to open a pull request first. https://t.co/zKQ4DeSiC3 https://
GitHub Copilot cloud agent just got a lot more flexible ✨ You can now it use it to research, plan, and make code changes without needing to open a pull request first. https://t.co/zKQ4DeSiC3 https://t.co/Soi08zV4XS
View originalReal-time accessibility checks directly in Nuxt DevTools are now a reality. Check out the Nuxt A11y module. It’s built on axe-core and scans your app as you navigate, highlighting WCAG issues directl
Real-time accessibility checks directly in Nuxt DevTools are now a reality. Check out the Nuxt A11y module. It’s built on axe-core and scans your app as you navigate, highlighting WCAG issues directly on the page with zero production impact. By helping teams catch issues early, this module has the potential to make countless Nuxt apps more inclusive by default. @timdamen_io explains how it works. ▶️
View originalSingle-prompt AI workflows often hit a performance plateau. Multi-agent systems can push past it, but they usually require a massive amount of setup. Squad, an open source project built on GitHub Cop
Single-prompt AI workflows often hit a performance plateau. Multi-agent systems can push past it, but they usually require a massive amount of setup. Squad, an open source project built on GitHub Copilot, initializes a preconfigured AI team directly inside your repo. Learn how to run multi-agent workflows that stay inspectable, predictable, and collaborative. https://t.co/1ewya9yPpC
View originalWriting code to automate your repo? Great. Writing Markdown to do it? Pretty sick. GitHub Agentic Workflows: now in technical preview. https://t.co/n9qDJI3JzE
Writing code to automate your repo? Great. Writing Markdown to do it? Pretty sick. GitHub Agentic Workflows: now in technical preview. https://t.co/n9qDJI3JzE
View originalGitHub powers your code, but it can also power your daily life. 🔋 Instead of downloading another productivity app, manage your tasks right where you already work: ✅ Issues for chores and bills 🏷️ L
GitHub powers your code, but it can also power your daily life. 🔋 Instead of downloading another productivity app, manage your tasks right where you already work: ✅ Issues for chores and bills 🏷️ Labels for priority and status 📊 Projects for your daily schedule Here’s how to set up your personal operating system. 👇 https://t.co/WsO5Zwpb5C
View original😬 We've all been there, right? Our latest episode of GitHub for Beginners is all about making sure your projects are secure. Check it out now. https://t.co/MRXVNnv1XD https://t.co/6hogZiyMo6
😬 We've all been there, right? Our latest episode of GitHub for Beginners is all about making sure your projects are secure. Check it out now. https://t.co/MRXVNnv1XD https://t.co/6hogZiyMo6
View original@photonstorm Congrats on the big 4.0 (and 13th next week) 🎉
@photonstorm Congrats on the big 4.0 (and 13th next week) 🎉
View originalRepository Audit Available
Deep analysis of getomni-ai/zerox — architecture, costs, security, dependencies & more
Pricing found: $50.10, $48.71, $48.71, $48.71, $9.74
Key features include: Pass in a file (PDF, DOCX, image, etc.), Convert that file into a series of images, Pass each image to GPT and ask nicely for Markdown, Aggregate the responses and return Markdown, GPT-4 Vision (gpt-4o), GPT-4 Vision Mini (gpt-4o-mini), GPT-4.1 (gpt-4.1), GPT-4.1 Mini (gpt-4.1-mini).
Based on 55 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.