GenAI Agent Framework, the Pydantic way
GenAI Agent Framework, the Pydantic way Pydantic AI is a Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI. FastAPI revolutionized web development by offering an innovative and ergonomic design, built on the foundation of Pydantic Validation and modern Python features like type hints. Yet despite virtually every Python agent framework and LLM library using Pydantic Validation, when we began to use LLMs in Pydantic Logfire, we couldn't find anything that gave us the same feeling. We built Pydantic AI with one simple aim: to bring that FastAPI feeling to GenAI app and agent development. Realistically though, no list is going to be as convincing as giving it a try and seeing how it makes you feel! Here's a minimal example of Pydantic AI: The exchange will be very short: Pydantic AI will send the instructions and the user prompt to the LLM, and the model will return a text response. Here is a concise example using Pydantic AI to build a support agent for a bank: Complete bank_support.py example Even a simple agent with just a handful of tools can result in a lot of back-and-forth with the LLM, making it nearly impossible to be confident of what's going on just from reading the code. To understand the flow of the above runs, we can watch the agent in action using Pydantic Logfire. To do this, we need to set up Logfire, and add the following to our code: That's enough to get the following view of your agent in action: See Monitoring and Performance to learn more. The Pydantic AI documentation is available in the llms.txt format. This format is defined in Markdown and suited for LLMs and AI coding assistants and agents. As of today, these files are not automatically leveraged by IDEs or coding agents, but they will use it if you provide a link or the full text. Read the docs to learn more about building applications with Pydantic AI. Read the API Reference to understand Pydantic AI's interface.
Mentions (30d)
0
Reviews
0
Platforms
2
GitHub Stars
15,963
1,853 forks
Features
Industry
information technology & services
Employees
28
Funding Stage
Series A
Total Funding
$17.2M
2,534
GitHub followers
73
GitHub repos
15,963
GitHub stars
20
npm packages
12
HuggingFace models
Pricing found: $123.45., $123.45.
LLM Documentation accuracy solved for free with Buonaiuto-Doc4LLM, the MCP server that gives your AI assistant real, up-to-date docs instead of hallucinated APIs
LLMs often generate incorrect API calls because their knowledge is outdated. The result is code that looks convincing but relies on deprecated functions or ignores recent breaking changes. Buonaiuto Doc4LLM addresses this by providing free AI tools with accurate, version-aware documentation—directly from official sources. It fetches and stores documentation locally (React, Next.js, FastAPI, Pydantic, Stripe, Supabase, TypeScript, and more), making it available offline after the initial sync. Through the Model Context Protocol, it delivers only the relevant sections, enforces token limits, and validates library versions to prevent mismatches. The system also tracks documentation updates and surfaces only what has changed, keeping outputs aligned with the current state of each project. A built-in feedback loop measures which sources are genuinely useful, enabling continuous improvement. Search is based on BM25 with TF-IDF scoring, with optional semantic retrieval via Qdrant and local embedding models such as sentence-transformers or Ollama. A lightweight FastAPI + HTMX dashboard provides access to indexed documentation, queries, and feedback insights. Compatible with Claude Code, Cursor, Zed, Cline, Continue, OpenAI Codex, and other MCP-enabled tools. https://github.com/mbuon/Buonaiuto-Doc4LLM submitted by /u/mbuon [link] [comments]
View originalResearch shows auto-generated context makes AI agents 2-3% worse. I tested the opposite approach.
Hey, I've been building in the AI agent space and kept running into the same problem: agents don't really fail at writing code. They fail at understanding how the project works before they start. So they guess. Where to make changes, what pattern to follow, what files are safe to touch. And that's what causes most bad edits. I came across the ETH Zurich AGENTS.md study showing that auto-generated context can actually degrade agent performance by 2-3%. That matched what I was seeing — dumping more code or bigger prompts didn't help. It just gave the agent more surface area to guess from. So I tried the opposite: what if you only give the agent the stuff it *can't* infer from reading code? Things like: - conventions (how routing/auth/testing is actually done in this project) - constraints (generated files you shouldn't edit, circular deps to avoid) - structural signals (which files have 50+ dependents — touch with care) - git signals (what keeps breaking, what was tried and reverted) I built a CLI (and a few runtime tools so the agent can check itself mid-task) to test this. It scans a repo and generates ~70 lines of AGENTS.md with just that information. No LLM, no API key, runs locally in a few seconds. Then I ran it against real closed GitHub issues (Cal.com, Hono, Pydantic) with a pinned model. Agents with this context navigated to the right file faster, used the correct patterns, and produced more complete fixes. On one task: 136s vs 241s, with a 66% more thorough patch — from 70 lines of context, not the full repo. The surprising part: the biggest improvement didn't come from *adding* context. It came from removing everything that didn't matter. This actually lines up with something Karpathy has been saying recently — that agents need a knowledge base, not just more tokens. That distinction clicked after seeing it play out in practice. I also compared against full repo dumps and graph-based tools, and the pattern held — graphs help agents explore, but project knowledge helps them decide. Curious if others have seen the same thing. Feels like most of the problem isn't "more context," it's the wrong kind. (if anyone's curious, the CLI is called sourcebook — happy to share more, but mostly interested in whether this matches what others are seeing with their agents) submitted by /u/re3ze [link] [comments]
View originalI made a terminal pet that watches my coding sessions and judges me -- now it's OSS
https://preview.redd.it/c1h2wvnv6ptg1.png?width=349&format=png&auto=webp&s=46e935832611acd401bb32eac69e7de615067d4f I really liked the idea of the Claude Code buddy so I created my own that supports infinite variations and customization. It even supports watching plain files and commenting on them! tpet is a CLI tool that generates a unique pet creature with its own personality, ASCII art, and stats, then sits in a tmux pane next to your editor commenting on your code in real time. It monitors Claude Code session files (or any text file with --follow) through watchdog, feeds the events to an LLM, and your pet reacts in character. My current one is a Legendary creature with maxed out SNARK and it absolutely roasts my code. Stuff I think is interesting about it: No API key required by default -- uses the Claude Agent SDK which works with your existing Claude Code subscription. But you can swap in Ollama, OpenAI, OpenRouter, or Gemini for any of the three pipelines (profile generation, commentary, image art) independently. So your pet could be generated by Claude, get commentary from a local Ollama model, and generate sprite art through Gemini if you want. Rarity system -- when you generate a pet it rolls a rarity tier (Common through Legendary) which determines stat ranges. The stats then influence the personality of the commentary. A high-CHAOS pet is way more unhinged than a high-WISDOM one. Rendering -- ASCII mode works everywhere, but if your terminal supports it there's halfblock and sixel art modes that render AI-generated sprites. It runs at 4fps with a background thread pool so LLM calls don't stutter the display. Tech stack -- Python 3.13, Typer, Rich, Pydantic, watchdog. XDG-compliant config paths. Everything's typed and tested (158 tests). Install with uv (recommended): uv tool install term-pet Or just try it without installing: uvx --from term-pet tpet GitHub: https://github.com/paulrobello/term-pet MIT licensed. Would love feedback, especially on the multi-provider config approach and the rendering pipeline. submitted by /u/probello [link] [comments]
View originalI got tired of 3 AM PagerDuty alerts, so I built an AI agent to fix cloud outages while I sleep. (Built with GLM-5.1)
If you've ever been on-call, you know the nightmare. It’s 3:15 AM. You get pinged because heavily-loaded database nodes in us-east-1 are randomly dropping packets. You groggily open your laptop, ssh into servers, stare at Grafana charts, and manually reroute traffic to the European fallback cluster. By the time you fix it, you've lost an hour of sleep, and the company has lost a solid chunk of change in downtime. This weekend for the Z.ai hackathon, I wanted to see if I could automate this specific pain away. Not just "anomaly detection" that sends an alert, but an actual agent that analyzes the failure, proposes a structural fix, and executes it. I ended up building Vyuha AI-a triple-cloud (AWS, Azure, GCP) autonomous recovery orchestrator. Here is how the architecture actually works under the hood. The Stack I built this using Python (FastAPI) for the control plane, Next.js for the dashboard, a custom dynamic reverse proxy, and GLM-5.1 doing the heavy lifting for the reasoning engine. The Problem with 99% of "AI DevOps" Tools Most AI monitoring tools just ingest logs and summarize them into a Slack message. That’s useless when your infrastructure is actively burning. I needed an agent with long-horizon reasoning. It needed to understand the difference between a total node crash (DEAD) and a node that is just acting weird (FLAKY or dropping 25% of packets). How Vyuha Works (The Triaging Loop) I set up three mock cloud environments (AWS, Azure, GCP) behind a dynamic FastApi proxy. A background monitor loop probes them every 5 seconds. I built a "Chaos Lab" into the dashboard so I could inject failures on demand. Here’s what happens when I hard-kill the GCP node: Detection: The monitor catches the 503 Service Unavailable or timeout in the polling cycle. Context Gathering: It doesn't instantly act. It gathers the current "formation" of the proxy, checks response times of the surviving nodes, and bundles that context. Reasoning (GLM-5.1): This is where I relied heavily on GLM-5.1. Using ZhipuAI's API, the agent is prompted to act as a senior SRE. It parses the failure, assesses the severity, and figures out how to rebalance traffic without overloading the remaining nodes. The Proposal: It generates a strict JSON payload with reasoning, severity, and the literal API command required to reroute the proxy. No Rogue AI (Human-in-the-Loop) I don't trust LLMs enough to blindly let them modify production networking tables, obviously. So the agent operates on a strict Human-in-the-Loop philosophy. The GLM-5.1 model proposes the fix, explains why it chose it, and surfaces it to the dashboard. The human clicks "Approve," and the orchestrator applies the new proxy formation. Evolutionary Memory (The Coolest Feature) This was my favorite part of the build. Every time an incident happens, the system learns. If the human approves the GLM's failover proposal, the agent runs a separate "Reflection Phase." It analyzes what broke and what fixed it, and writes an entry into a local SQLite database acting as an "Evolutionary Memory Log". The next time a failure happens, the orchestrator pulls relevant past incidents from SQLite and feeds them into the GLM-5.1 prompt. The AI literally reads its own history before diagnosing new problems so it doesn't make the same mistake twice. The Struggles It wasn't smooth. I lost about 4 hours to a completely silent Pydantic validation bug because my frontend chaos buttons were passing the string "dead" but my backend Enums strictly expected "DEAD". The agent just sat there doing nothing. LLMs are smart, but type-safety mismatches across the stack will still humble you. Try it out I built this to prove that the future of SRE isn't just better dashboards; it's autonomous, agentic infrastructure. I’m hosting it live on Render/Vercel. Try hitting the "Hard Kill" button on GCP and watch the AI react in real time. Would love brutal feedback from any actual SREs or DevOps engineers here. What edge case would break this in a real datacenter? submitted by /u/Evil_god7 [link] [comments]
View originalWe built an open-source framework for deploying AI agents in production; with built-in Claude Code skills
hey r/ClaudeAI, we just open-sourced Agent2; a production runtime for AI agents built on PydanticAI + FastAPI. what makes it relevant here: the repo ships with built-in SKILL.md files that teach Claude Code how to use the framework. open the repo in claude code and it already knows how to scaffold agents, add knowledge bases, wire up approvals, and debug issues. skills included: - /create-agent — scaffolds a complete agent service - /building-domain-experts — knowledge-backed document processing - /adding-knowledge — R2R collections, per-tenant scoping - /adding-capabilities — pause/resume, approvals, provider routing - /debugging-agents — systematic diagnosis we've processed 4M+ documents with it. $200k+ revenue. bootstrapped. the idea: you describe what you want in claude code, and it builds a production AI agent backend. schema in, API out. → https://github.com/duozokker/agent2 MIT licensed. feedback welcome. submitted by /u/duozokker [link] [comments]
View originalI got tired of Claude hallucinating decimal points in financial CSVs, so I built a 3-layer deterministic MCP Server.
Hey everyone, If you’ve ever tried feeding a 5,000-row CSV, a messy broker trade history, or a bank statement (like Norma 43 or SEC XBRL) directly into Claude's context window, you know the pain. **The Token Tax:** Sending raw B2B formats to a context window burns tokens for no reason. **The Hallucination Risk:** LLMs struggle with strict spatial alignment. One misplaced comma by the AI, and a $100.50 transaction becomes a $10,050.00 disaster. I realized that "LLM-first" is the wrong architecture for structured B2B data. AI agents shouldn't *read* CSVs; they should query a deterministic middleware. So, I built **ETL-D** and just open-sourced the MCP Server for Claude Desktop. **The Architecture (The "Waterfall" approach):** Instead of dumping text to the LLM, when you ask Claude to parse a file, it routes it to the MCP server which processes it in 3 strict layers: * **Layer 1 (Heuristics):** 100% Python (`regex`, `dateutil`, strict structural parsers). If it's a known format, it parses instantly. We just ran a load test: 200 parallel requests hit ~70ms response times with **0 LLM calls**. Zero hallucination risk. * **Layer 2 (Semantic Routing):** If headers are obfuscated, we use a lightweight router to map columns to strict Pydantic schemas. * **Layer 3 (LLM Fallback):** Only triggered for high-entropy "free-text" noise (using Llama 3.3 70b under the hood to enforce JSON schemas). Claude just gets a perfectly clean, flattened JSON array back, ready for actual reasoning. **Try it out:** I just got it approved on the official Anthropic MCP Registry today. You can check out the source code and how to configure it in your `claude_desktop_config.json` here: 🔗 **GitHub:** [pablixnieto2/etld-mcp-server](https://github.com/pablixnieto2/etld-mcp-server) Would love to hear how you guys are handling the "Data Tax" and preventing hallucinations in your own agent pipelines. Any feedback on the architecture is welcome! submitted by /u/PrettyOne8738 [link] [comments]
View originalTheow - Heal your CI automatically with LLMs with 0 clicks and 0 copy pasting context
Theow is an observable, programmatic LLM agent that auto-heals failing Python functions at runtime. Wrap any function withtheow.mark(), and when it raises, theow intercepts the exception, diagnoses it, and retries transparently. Every LLM call, tool execution, and token spend is traced via OpenTelemetry. Zero prompt engineering. Zero code changes beyond the decorator. Initially at my work we were figuring out a way to leverage LLMs in a packaging pipeline to recover the workflow on the fly based on failure. This lead to the development of Theow. Quickly after I realized CI pipelines are basically sequential workflows that is self contained and with enough failure context. So I started using theow decorators to wrap my CI steps and let it automatically heal and create PRs to the feature branch. Its different from solutions like Copilot (which also ties you the platform) because theow lives inside your process and gets triggered on failiure. What this means is that, for example, in an integration test, the LLM has the opportunity to investigate the actual environment and not just work based off of the static error logs. Theow is built on top of pydantic-ai and supports all the providers supported by pydantic-ai. And on top it also supports copilot-sdk so you can also use it with your copilot subscription and the claude-agent-sdk It has observability built-in with logfire, so you can get the LLM telemetry directly in logfire or use your own observability stack. I use it recover my projects CI pipelines and plan to integrate into my workplaces central CI. Here are some actual examples of theow at work (parrot is a test runner bot for CI that uses theow) - Auto-healed lint and unit tests with PR fixes - In-runner investigation and fix suggestions for an integration test Theow is free and open source. Here is the repo - https://github.com/adhityaravi/theow. Happy to get feedback or even more happy to support if you wanna try it on your own workflow submitted by /u/__4di__ [link] [comments]
View originalI built a full-stack SaaS in ~10 hours with Claude Code — paste a business name, get a deployed website in 60 seconds
I've been deep in Claude Code for a few months now and just shipped something I think shows what's actually possible with agentic development when you set it up right. Wanted to share the real workflow, not the hype. What I built Site Builder Paste a business name, get a fully deployed website in 60 seconds. It scrapes Google Maps (Playwright + Chromium), writes all the copy (Claude Sonnet), generates images for sections without real photos (Gemini), assembles a React + Tailwind site from 14 components, and auto-deploys to Cloudflare Pages. Live URL returned instantly. Live demo: https://site-builder-livid.vercel.app/ How Claude Code actually made this possible in a day The game-changer: persistent expertise files.** I maintain `expertise.yaml` files per domain (~600-1000 lines of structured knowledge). My WebSocket expert knows every event type, every broadcast method. My site builder expert knows every pipeline step, every model field. These load every session. By session 50, the agent knows your codebase like a senior engineer who's been on the team for a year. Session 1 vs session 50 is honestly night and day. The workflow that compounds: I chain three agents in sequence — Plan (reads expertise + codebase, writes a spec), Build (implements the spec), Self-Improve (diffs the expertise against the actual code, finds discrepancies, updates itself). The system literally audits itself after every build cycle. It catches things like "you documented this method at line 142 but it moved to line 178" or "the builder added a new WebSocket event that isn't in the expertise yet." Parallel agents are the real speed hack. When I need to update docs, scout for bugs, and build a feature — I launch all three simultaneously. Different files, different concerns, results back in minutes. I built four README files in the time it takes to write one. This is the biggest reason ~10 hours was enough for a full production system. Opus for architecture, Sonnet for volume. Pipeline design, multi-agent coordination, tricky debugging = Opus. Content generation, routine code, documentation = Sonnet. Match the intelligence to the task. You wouldn't hire a principal engineer to write boilerplate CSS. The CLAUDE.md rules file is underrated. Mine enforces: Pydantic models over dicts, no mocking in tests (real DB connections), use Astral UV not raw Python, never commit unless asked, read entire files before editing. The agents follow these consistently because they're always in context. I've watched my agent catch itself mid-edit and switch from a dict to a Pydantic model because the rules said so. What went wrong (because it's not all magic): - TypeScript build failures on Railway because `tsconfig.json` was in my root `.gitignore` and never got committed for 2 of 3 templates. Took 3 deploys to figure out. Claude Code found it instantly once I SSH'd into the Railway container and let it look around. - Franchise businesses (chains with multiple locations) break the scraper assumptions. Had to build a whole confidence scoring system — high/low/none — with franchise detection heuristics and editor warning banners. - AI-generated images showed up on deployed sites but were broken in the editor preview. The editor uses iframe `srcdoc` (inlined HTML), so relative paths like `/images/services.png` don't resolve. Had to base64-encode them into the HTML bundle. - TinyMCE required domain registration for every deployed site. Ripped it out and replaced with a plain textarea. Sometimes simpler wins. The stack (10 backend modules, 14 React components, 5 Vue components): - Backend: Python 3.12, FastAPI, Pydantic v2, Playwright - Frontend: Vue 3 + TypeScript + Pinia - Generated sites: React + Tailwind CSS (14 section components) - AI: Claude Opus 4.6 (orchestration) + Sonnet 4.6 (content) + Gemini3.1 Flash (nano banana) - Deploy: Docker + Railway (backend), Vercel (frontend), Cloudflare Pages (generated sites) - Real-time: WebSocket streaming with progress panel This is one of 7 apps in a monorepo called Agent Experts credit (u/indydevdan) ( built on the ACT > LEARN > REUSE pattern. Agents that actually remember and improve. **Now I need help.** The builder works. Sites look like $5K custom builds. The workflow is: find business on Google Maps > generate site (60 sec) > customize in inline editor > sell for $500-$800. But I'm an engineer, not a GTM person. I'm looking for: **Feedback** — what would make this more valuable? What's missing? **GTM partner/advisor** — someone who's launched a SaaS or productized service agency. I need help with pricing model (per-site vs subscription vs white-label), distribution channels, and go-to-market strategy. **Early users** — if you do freelance web development or run a micro-agency, I'd love to let you try it and hear what breaks. DMs open. Happy to share the expertise file patterns with anyone building with Claude Code — the persistent memory approach works regardless of what y
View originalRepository Audit Available
Deep analysis of pydantic/pydantic-ai — architecture, costs, security, dependencies & more
Pricing found: $123.45., $123.45.
Key features include: In a real use case, you'd add more tools and longer instructions to the agent to extend the context it's equipped with and support it can provide., Configure the Logfire SDK, this will fail if project is not set up..
Pydantic AI has a public GitHub repository with 15,963 stars.
Based on 13 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.