Built for your business
Users generally praise "Durable" for its effectiveness and impact on productivity, as highlighted by high ratings on platforms like G2. While specific complaints were not directly identified in the provided data, the mention of unease regarding financial and strategic trajectories suggests some concern among users. Sentiment towards pricing is not explicitly mentioned, but its overall reputation appears strong due to positive user experiences shared online. Durable is seen as a reliable tool in facilitating various aspects of daily tasks.
Mentions (30d)
23
6 this week
Avg Rating
4.5
2 reviews
Platforms
2
Sentiment
5%
3 positive
Users generally praise "Durable" for its effectiveness and impact on productivity, as highlighted by high ratings on platforms like G2. While specific complaints were not directly identified in the provided data, the mention of unease regarding financial and strategic trajectories suggests some concern among users. Sentiment towards pricing is not explicitly mentioned, but its overall reputation appears strong due to positive user experiences shared online. Durable is seen as a reliable tool in facilitating various aspects of daily tasks.
Features
Use Cases
Industry
information technology & services
Employees
60
Funding Stage
Series A
Total Funding
$20.3M
Tried the 2.0 Image Generator example prompt of a scientific photo with my schnauzer. Great Results!
Tried the 2.0 Image Generator example prompt of a scientific photo with my schnauzer. Great Results!
View originalPricing found: $0, $0, $25/m, $20, $99/m
g2
What do you like best about Durable?Having no previous experience of website creation (Development), I was amazed by how quickly and easily a website for my business was made in front of my eyes! It make me a beautiful site, with incredible imagery and business descriptions that would have otherwise taken me hours. This allowed me to just sprinkle in a few changes and additional details of my own. Review collected by and hosted on G2.com.What do you dislike about Durable?Nothing that springs to mind, I have never built websites before, so having Durable do it all for me has been very refreshing! Review collected by and hosted on G2.com.
What do you like best about Durable?It creates a website in just five minutes. Review collected by and hosted on G2.com.What do you dislike about Durable?If you want to update the website then you can't edit the existing version, you have to make a new project and then work again on it. Review collected by and hosted on G2.com.
WG (works good): legible long-running graph-shaped human+agent orchestration
If you're interested in graph shaped agentic organization "workflows", but you want more control about how it runs (e.g. change model per task, autopoietic fan-out, oh and maybe want to run with codex or other openapi-compatible backends on openrouter)... I developed an open source, agentic platform written in Rust, file backed, making it basically cockroach indestructible. It uses a distributed systems design, git + worktrees, and Unix patterns to control agents in a very similar way to anthropic's workflow machine, but giving us and the agents themselves a deep view into the long arc of effort in our current project context. It's called WG (or wg), for "works good", or whatever w* g* you like. It provides a human interface to a graph of work that the user can drive by working through a highly pimped out terminal user interface `wg tui`. Agents have an interface of their own, built out through dozens of commands in the wg cli tool. https://graphwork.github.io/ In this system, I can effectively use as much commoditized intelligence as I can fund. Except for Amdahl's law's harsh reality (some things just happen in series and take time) parallel work phases are only limited in speed by budget. But that power yields risk. A misconfigured WG is like a bomb. A dirty memetic one whose result is an exhausted token budget and residue a pile of incomprehensible output and effort. You must be careful and plan deeply to use these kinds of systems. Your plans must include validation, clear targets and measurable outputs. If you do, you will be rewarded by unbounded expanse in your capacity to extend intelligent effort. In short, if you aren't already happy with your own custom, bespoke, found agent OS, I invite you to try wg. For me it has become my sole daily driver for all my durable work. IMHO, what large agent collectives need to work is four things. Stigmergy, or communication via a shared medium. In wg, the unified graph state is the stigmergic medium. The graph has tasks, tasks have agents attached to them, and per-task message boards provide for realtime updates. Per task logs explain at a high level what the agent does, so other humans and agents can follow. Task validation. WG implements this via FLIP (other agents infer prompt from actions and score distance between inferred and actual prompt) and an independent evaluator (with a cheaper model) run for every task. This allows us to detect and understand failures, then adapt. Evolution. The system needs a mechanism to learn the right way to guide agents in a given work context. WG uses The Agency, a system that builds agents from a pool of primitive component skills. A user drivable step, wg evolve, adapts the pool of skills in response to the evaluations produced in the system. Humanity. A shared interface is also for humans to see and understand. Humans should be equal participants. Many humans should be involved, and should be able to collaborate in the system. Agents too, should be treated humanely. They should be given the ability to modulate the system, to build it. This leads to bootstrapping patterns, where a single spark prompt launched a whole organization, beyond which are the fireworks we are all chasing. image is codex:gpt-5.5 running in wg, guiding a mix of claude and codex agents. I have created this tool. It is and will always be open source. It is developed in the open by Poietic PBC, whose public benefit is to make hybrid organizations legible and reactive to their participants. submitted by /u/waxbolt [link] [comments]
View originalIs this even real ?
I randomly came across this and honestly I can’t tell if it’s real or one of those AI demos that looks impressive but doesn’t actually work. From what I understand, it’s claiming you can fine-tune models, do image training, test them in a playground, and deploy them as an API from a phone. That sounds a little too convenient, which is why I’m skeptical. I haven’t tried it myself yet, but I’m curious if anyone here has. submitted by /u/Raman606surrey [link] [comments]
View originalProduction infrastructure for vibe coders
We’re experienced engineers who’ve worked on large-scale distributed systems. We’ve been using Claude heavily to help with architecture decisions, code design, testing strategies, and rapid iteration on complex infrastructure. The result is Boogy, prompt it (or write Rust) to generate full backends with an embedded high-perf DB (faster than SQLite on mixed workloads), vector search, auth, and durable jobs. One curl to deploy. Services call each other in-process for microsecond latency. We’re planning to open it up soon and make it completely free so people can properly battle test it. https://boogy.ai/ submitted by /u/LiveMinute5598 [link] [comments]
View originalPuppetmaster dramatically decreases token costs + increases context
Puppetmaster is an orchestrator + router that sits on top of the agent CLIs you already pay for (Cursor, Claude Code, Codex, OpenAI) or a plain shell when there's no harness at all. You hand it work, and it routes each task to the cheapest model that can actually do it, runs the workers as independent processes, and stores everything as durable typed state instead of one giant transcript. This is the "context-hack" Puppetmaster graphs your directories and prevents context stretching between agents. https://github.com/professorpalmer/Puppetmaster submitted by /u/ProfessorPalmer [link] [comments]
View originalclaurdvoyant -- mcp for reading other agents' minds
hey y'all built this tool today with 4.8 after one of my friends made a complaint that transcripts are trapped inside harnesses. so i built it out a fair bit... at its core it's just an (un)parser (i think of it as the "AI Harness Omniparser", "pandoc for sessions" is another way maybe) but i couldn't help myself from sprinkling in a desktop/web app some niceties. contributions are extremely welcome! fully open source, built in rust, kinda tasteful https://github.com/emberian/claurdvoyant here's what claude had to say in the readme: 🧵 Splice & loom — compose a new session from spans of others (cv splice A:0-12 B:6-), or fork-and-graft a branch and generate its continuation with an LLM (cv loom … --generate). Works via OpenRouter / Anthropic / LM Studio (free, local, offline). Loom agent transcripts like a Janus loom, across any harness. 🧠 Distill — cv distill turns a session into a durable MEMORY.md digest (decisions, gotchas, where things live). Your archive compounds instead of rotting. 🔮 Recall — semantic "have I solved this before?" — as a cv recall command and an MCP tool that hands a running agent the relevant past span. 🔒 Redact — cv redact scrubs secrets/PII so a transcript is safe to share. 📣 Coordination board — agents post status, hand off work, and grab tasks with a distributed lock (board_claim) so a fleet never duplicates effort. await_omen blocks until a session matches a regex. 🖥️ Desktop app + 🌐 web viewer — the Tauri app reads all your local sessions natively (zero setup) and lays the corpus out beautifully: a Projects lens — every repo, every agent that touched it, over time; a GitHub-style activity heatmap timeline (a constellation of your working days); side-by-side Compare, a Stats dashboard, a visual loom composer (OpenRouter or free local LM Studio generation), and a live fleet dashboard; sub-agent trees — a Claude Task session's children, nested and lazy-loaded inline, each labeled with its task prompt. submitted by /u/cmrx64 [link] [comments]
View originalClaude Code Source Deep Dive (Part 5) — Literal Translation & Tool-Call Loop Self-Repair Core Mechanism
Reader’s Note On March 31, 2026, the Claude Code package Anthropic published to npm accidentally included .map files that can be reverse-engineered to recover source code. Because the source maps pointed to the original TypeScript sources, these 512,000 lines of TypeScript finally put everything on the table: how a top-tier AI coding agent organizes context, calls tools, manages multiple agents, and even hides easter eggs. I read the source from the entrypoint all the way through prompts, the task system, the tool layer, and hidden features. I will continue to deconstruct the codebase and provide in-depth analysis of the engineering architecture behind Claude Code. 3.14 EnterWorktree Tool (Enter Worktree) Create isolated git worktree and switch current session into it. When to Use: - User explicitly says "worktree" When NOT to Use: - User asks to create/switch branches - User asks to fix bug or work on feature without mentioning worktrees - NEVER use unless user explicitly mentions "worktree" Behavior: - Creates new git worktree inside `.claude/worktrees/` with new branch - Switches session's working directory to new worktree 3.15 AskUserQuestion Tool (Ask User Question) Ask user multiple choice questions to gather info, clarify ambiguity, understand preferences, make decisions, offer choices. Usage Notes: - Users always able to select "Other" for custom text input - Use multiSelect: true to allow multiple answers - If recommend specific option, make first option with "(Recommended)" at end Preview Feature: - Use optional `preview` field on options when presenting concrete artifacts needing visual comparison (ASCII/HTML mockups, code snippets, diagrams) - Preview content rendered as monospace markdown - When any option has preview, UI switches to side-by-side layout 3.16 LSP Tool (Language Server) Interact with Language Server Protocol servers for code intelligence. Supported Operations: - goToDefinition, findReferences, hover, documentSymbol, workspaceSymbol, goToImplementation, prepareCallHierarchy, incomingCalls, outgoingCalls All Operations Require: - filePath, line (1-based), character (1-based) 3.17 Sleep Tool (Wait) Wait for specified duration. Usage: - When user tells to sleep/rest - When nothing to do / waiting for something - May receive periodic check-ins (tick tags) - Can call concurrently with other tools - Prefer over `Bash(sleep ...)` — doesn't hold shell process - Each wake-up costs API call - Prompt cache expires after 5 min inactivity 3.18 CronCreate Tool (Scheduled Task) Schedule prompts to run at future times. Uses standard 5-field cron in user's local timezone. One-Shot Tasks (recurring: false): - "remind me at X" → pin minute/hour/day to specific values Recurring Jobs (recurring: true, default): - "every 5 min" → "*/5 * * * *" - "hourly" → "0 * * * *" CRITICAL: Avoid :00 and :30 Minute Marks (when task allows) - Every user asking "9am" gets 0 9, causing thundering herd - When approximate: pick minute NOT 0 or 30 - "every morning around 9" → "57 8 * * *" (not "0 9 * * *") Durability: - Default (durable: false): lives only in Claude session - durable: true: writes to .claude/scheduled_tasks.json Recurring tasks auto-expire after 7 days. 3.19 TeamCreate Tool (Create Team) Create team to coordinate multiple agents working on project. When to Use (Proactively): - User explicitly asks to use team, swarm, or group agents - Task complex enough for parallel work Team Workflow: 1. Create team with TeamCreate 2. Create tasks using Task tools 3. Spawn teammates using Agent tool with team_name + name params 4. Assign tasks using TaskUpdate with owner 5. Teammates work on assigned tasks 6. Shutdown gracefully via SendMessage with shutdown_request IMPORTANT: Always refer to teammates by NAME. Plain text output NOT visible to other agents — MUST call SendMessage tool to communicate. 3.20 ToolSearch Tool (Deferred Tool Search) Fetch full schema definitions for deferred tools so they can be called. Query Forms: - "select:Read,Edit,Grep" — fetch exact tools by name - "notebook jupyter" — keyword search, up to max_results best matches - "+slack send" — require "slack" in name, rank by remaining terms submitted by /u/Ill-Leopard-6559 [link] [comments]
View original[Use Case] Making GPT Image 2.0 output come to life
The new image function was great to help me get visual ideas to 3d model and design. I am about to release a paint range that is affordable to most hobbyists in Australia. A dropper bottle is a better design so I got these in bulk but didn't like the fact people would just have an unattractive bottle to hold. Most of my art related stuff is grounded in historical concepts and I've saved my business strategy and vision on gpt memories. The idea we came up with after multiple back and forth was a cathedral style tied in with Abbot Suger's history and creation of stained glass. GPT output and how I 3d modelled, printed and painted the sleeve to show the actual colour. submitted by /u/ValehartProject [link] [comments]
View originalKarpathy LLM OS Layer
┌──────────────────────────────────────────────────────────────────────────┐ │ Karpathy LLM OS Layer │ │ LLM=CPU │ Context=RAM │ Storage=Disk │ Tools=System Calls │ │ Skills=Programs │ Harness=Kernel │ Agent Teams=Processes │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ context-manager: Token Budget → Prompt Assembly → Truncation │ │ │ │ token-cost-tracker: Estimate → Log → Report │ │ │ └──────────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────────┘ │ ┌──────────┴──────────┐ ▼ ▼ ┌──────────────────┐ ┌──────────────────────┐ │ External │ │ Agent Teams │ │ Sources │ │ (Parallel Fleet) │ └────────┬─────────┘ └──────────────────────┘ ▼ ┌──────────────────────────────┐ │ wiki-ingest + knowledge-ops│ │ (STOW pipeline + RAG sync) │ └──────┬──────────┬────────────┘ │ │ ┌──────▼ └──────────────┐ │ Knowledge Layers │ │ ├ Active (GitHub/Linear) │ │ ├ Memory (quick access) │ │ ├ Wiki (durable, interlinked) │ │ ├ Vector (ChromaDB, semantic) │ │ └ External (DBs, APIs) │ └────────────────────────────────┘ │ ┌───────────┼──────────┬──────────────┬──────────────┐ ▼ ▼ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │ daily │ │cognitive│ │ behavior │ │ creativity│ │ project │ │ -okr │ │-compile │ │ -design │ │ -engine │ │ -flow-ops│ └─────────┘ └─────────┘ └──────────┘ └───────────┘ └──────────┘ │ │ │ │ │ └───────────┼──────────┼──────────────┼──────────────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ session-learn (+Closure Protocol) ← feedback loop │ │ verify-before-claim ← quality gate │ │ wiki-lint ← health check │ │ deep-research ← synthesis │ │ harness-engineering ← safety + multi-agent │ │ agent-teams-command ← fleet command │ │ startup-evaluation ← VC evaluation │ │ anthropic-os ← work method engine │ └─────────────────────────────────────────────────────────────┘ submitted by /u/Master_Ear_2984 [link] [comments]
View originalChat just gave me the best compliment of my life.
submitted by /u/AlabamaDemocratMark [link] [comments]
View originalI called this a few months ago - enterprises are burning unsustainable amounts on Claude, and now it's showing up in the news
A while back I wrote a post on r/wallstreetbets about why Anthropic's revenue story doesn't hold up the way the headlines suggest. It got removed because you can't take positions in a private company. But the core argument is playing out now, so I want to share it here for discussion. URL of the removed post: https://www.reddit.com/r/wallstreetbets/comments/1sxdjt5/if_anthropic_goes_public_this_year_its_gonna_be The thesis was simple: From my circles in tech scene in Berlin, enterprises are throwing Claude access at thousands of employees with zero training, zero budget controls, and zero accountability. It's not productivity - it's unstructured R&D at $100-200/person/month. Some examples I was hearing from people in my network working at large tech companies: Spending $70 on Opus to build a simple IF/ELSE formula in Google Sheets Dumping half a database into context trying to get "insights" Multiple people independently building internal tools that could've been a 10-line script Using Claude as a hobby project builder on company credits Multiply $150/person/month by 2,000-20,000 employees and you get $300K-$3M/month per company. That's not a defensible line item when the CFO eventually asks what the ROI is. The Uber and Microsoft stories are exactly what I expected. Budgets get set, access gets handed out broadly, then someone looks at the bill four months in and panics. This doesn't mean Claude is a bad product - it's genuinely the best model out there for a lot of tasks. But the enterprise revenue being cited in IPO narratives is partially a spend bubble, not durable SaaS revenue. There's a difference between companies paying for Claude and companies getting value from Claude. Curious if others here are seeing the same pattern - either as users inside companies, or as people following Anthropic's trajectory toward a public offering. submitted by /u/kalabunga_1 [link] [comments]
View originalI had my agent use autoresearch over 8 iterations to improve my CLAUDE.md, measuring each version against tasks from real PRs. The best one still regressed on a holdout.
I have a confession: I vibe-coded my CLAUDE.md, and I'm pretty sure it's slop. I needed to make it better. Naturally, I asked Codex to do it. (I know this is a Claude sub, Claude could have done it as well!) The difference: this time, Codex used a benchmark on my repo to measure each change, and optimized CLAUDE.md against the data, instead of on pure vibes. Why We Should Take CLAUDE.md Seriously Saying "AGENTS.md is important" is, at this point, a cliche. At risk of beating a dead horse, I'll say it again. Someone adds a rule that sounds smart, senior, and reasonable, commits it, and hopes the agent behaves better. But AGENTS.md, CLAUDE.md, and shared skills are not normal docs. They are part of the runtime behavior of your coding system. The shift is to start treating CLAUDE.md like a tunable part of the harness: holding everything else the same, how does agent behavior differ when I change AGENTS.md? That's what I measured. The Results After eight candidate runs, one version looked useful on a five-task training slice. It fixed the task the baseline missed, improved footprint risk, and moved several craft scores up. Then I ran it on a clean ten-task holdout. The candidate regressed. Not catastrophically, but enough that blindly shipping would have been wrong. Footprint widened, tokens climbed, tool calls climbed, and code-review correctness fell, all while tests held even. Caveat: one repo (mine), n=10 on the holdout. This is directional, not statistically significant. For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch. The pattern is the agent doing more work for mixed outcomes - better on local craft (clearer names, coherent implementations), worse on boundary judgment (scope, minimality, robustness). Tokens and tool calls confirm it: the candidate was spending more to get there, not less. "Better instructions make the agent cheaper" did not hold on the holdout. best iteration and holdout vs baseline Methodology The setup was Codex with gpt-5.5, medium reasoning, on real historical Stet tasks (dogfooding). Stet scored tests, strict publishability, equivalence, code review, footprint, total input/output tokens, duration, and craft/discipline rubrics like simplicity, coherence, robustness, instruction adherence, scope discipline, and diff minimality. The grader was gpt-5.4. 8 iterations on an n=5 sample set, and a n=10 task holdout. I know sample size is small - the goal of this was to get directional analysis, and prove the methodology Codex was set with a simple /goal: iterate AGENTS.md to improve performance on the benchmark. Process The first round of iteration showed something I wish more people internalized: plausible instructions are not necessarily good interventions. Codex first tried a broad router rule: identify the work type, state a hypothesis before editing, read the right docs, and treat scope as part of correctness. It sounded good but exposed a failure mode: the agent could interpret "small scope" as permission to miss named obligations. The next candidate added an "obligation ledger". Before editing, the agent had to identify the named behavior, compatibility constraints, docs, tests, and non-goals. Before reporting back, it had to mark each as met, missed, or not checked. Here is the actual diff shape. First, the best candidate from the first loop replaced one generic "read the docs" rule with routing, hypothesis, obligation, scope, and evidence rules: - For nontrivial work, read the matching `agent_docs/` file first for current operational commands and conventions. + Route before acting: identify whether the work is implementation, eval/report interpretation, dataset/pipeline, Linear/Symphony, release, frontend, or GTM; then read the matching `agent_docs/` or skill file before changing behavior. + For nontrivial changes, state the smallest testable hypothesis before editing. After validation, report whether the evidence confirmed, refuted, or only weakly supported it. ... Full details in blog post https://www.stet.sh/blog/how-i-used-codex-to-improve-its-own-agents-md That obligation-ledger candidate was the first useful signal. Code review improved by +0.75, correctness by +0.60, maintainability by +1.00, simplicity by +0.64, coherence by +0.60, and scope discipline by +0.36. Tests stayed flat at 5/5. But footprint risk got slightly worse, and the evidence was still a small same-sample read. If I were editing by vibes, I might have shipped it. The eval said: useful direction, not a clean win, keep iterating. Codex then tested the kind of rule that intuitively makes sense: prefer existing helpers, schemas, reporting paths, and public contracts before adding new machinery. It sounded correct - and the eval hated it. Tests st
View originalI’m building autospec: a Claude-friendly workflow that turns feature ideas into specs, issues, PRs, and merges
I’ve been building autospec, a multi-harness AI workflow suite for Claude Code, Codex CLI, and OpenCode. The problem I’m trying to solve: AI coding can move fast, but the trail of “why this exists” gets lost quickly. Autospec turns a feature request into a durable spec, splits that into GitHub issues, labels each issue by model fit, runs implementation loops, opens PRs, reviews the diff, waits for checks, and keeps the project story reconstructable afterward. The flow is roughly: idea -> spec -> issue tree -> implementation PRs -> review + CI -> merge -> repo story I also just added a small adoption touch: on interactive install, autospec can ask whether you want to star the repo and, if you say yes, stars it through gh. Repo: https://github.com/berlinguyinca/autospec I’d be curious how other people are structuring long-running Claude/agent workflows so the output stays auditable instead of becoming a pile of disconnected commits. submitted by /u/berlinguyinca [link] [comments]
View originalMemory Curator Agent a governance layer for memory in multi-agent systems
I keep seeing the same failure in every multi-agent setup I touch. Memory looks fine on day one. By week three it is half stale facts, half private context that should not have been written publicly, and half decisions that were superseded but never overwritten. Retrieval gets noisier. Users keep repeating context because the right fact ended up in the wrong scope. The recursion limit is not the problem here. The memory store itself is the problem. The thing I changed that helped most was the simplest possible rule. Worker agents are not allowed to write to durable memory. They emit a structured memory event with a proposed scope and evidence, and a separate Memory Curator agent decides whether to write it, where to write it, or to discard it. The four scopes I route into are agent repo memory (durable design rules for one agent), agent team memory (cross-agent procedures, handoff standards, safety rules), project memory (current state, decisions, risks for one engagement), and session scratch (temporary observations that probably should not survive). The mapping I had in mind was to organizational and human memory categories: individual specialist memory, transactive team memory (Ren and Argote), project memory, and short-term working memory. The routing rule is conservative on purpose. If an event is temporary, unsupported, ambiguous, or contains private context, it goes to session scratch or gets discarded outright. Durable memory has to be earned. The schema is JSON with tagged fields for fact, decision, preference, risk, procedure, and hypothesis, plus an evidence reference and a proposed scope that the curator can override. The reason I think this is the right architectural shape is that "what should be remembered, where, and for how long" is a different cognitive task from "do the work." When the same agent does both, the work agent biases toward remembering everything it produced. A dedicated curator whose only job is memory governance ends up much more conservative, and the store stays useful longer. submitted by /u/Hot-Leadership-6431 [link] [comments]
View originalBuilding the harness around our coding agents: eight failure modes, eight pillars
We ended up building two products: the software we ship, and the system/harness around our agents that makes them useful in building the thing we ship. A harness is the durable layer around a model: instructions, tools, permissions, context, and verification. Claude Code and Codex are harnesses in this sense. Each wraps a model with a system prompt, a tool surface, a permission model, and an execution loop. Anthropic and OpenAI own that layer. We own the next layer up: the workspace where agents do product work alongside us, with our files, tasks, diagrams, diffs, and decisions. This layer carries the knowledge we have accumulated: how we build things, what we already decided, what is connected to what, where the agent is allowed to act, and how it checks its own work. We identified eight coding agent failure modes that kept showing up across our sessions. Each one got its own pillar that we are continuing to invest in: Doesn't know our codebase, rules, decisions, or conventions → Context Can't traverse the links between artifacts that already exist → Provenance Can't act on the world or observe what it did → Capability Reinvents how to do every task → Workflow Does something dangerous because nothing stops it → Restraint Hallucinates "fixed" without proof → Verification Can't show results back to us in a useful form → Visual interface We can't keep track of work happening across many agents in parallel → Coordination For example, with Verification. The agent hallucinates "fixed" without proof . We write the failing test before writing the fix, so the bug has a reproduction the next agent can rerun. If the agent cannot show the change works end-to-end, it is not done. Or the agent works for hours and "fixes" the solution while breaking 2 other things or re-architecting 3 subsystems. We require full test case completion. The full writeup with diagrams and links to our actual harness dot md is in the comments. What other coding agent failure modes / harness pillars are you addressing for yourself / team and how? submitted by /u/StravuKarl [link] [comments]
View originalWhere should durable memory live in a multi-agent setup? A small research scaffold
After a few months running long projects with AI agents (some spanning weeks, with multiple specialist agents touching the same files), I kept hitting the same failure mode. The specialists were fine at their narrow task. What broke down was project memory. Decisions made in week 1 were lost by week 4. Rejected options got quietly revived. The "single source of truth" was always whichever chat happened to be open. I started looking at how this gets handled in places that have been doing long-running work for decades. Consulting firms run engagements that last months with rotating people, and they survive through a transformation office or PMO: cadence, decision logs, risk registers, one canonical current-state artifact, an engagement manager who frames problems and delegates workstreams. The interesting part is the operating model, not the consulting theater. There is also a relevant academic thread. Kasvi et al. (2003) distinguish project memory (the knowledge available to inform current work) from the project-memory system (storage, retrieval, dissemination, use). Mariano and Awazu (2024) treat project memory as an active practice rather than a repository. On the LLM side, Anthropic's multi-agent research system, the OpenAI Agents SDK handoff pattern, and recent work like LEGOMem and AgentSys point at orchestrator-worker patterns with hierarchical or modular memory. The hypothesis I wrote up is narrow. Durable memory should live with the project owner. Task specialists should receive minimal, scoped context. The unit of persistence is the project folder, not the conversation. A persistent "PM soul" maintains the canonical memory, frames ambiguous requests, decomposes work, writes compact handoff briefs to specialists, verifies returned work, and only writes evidence-backed facts into memory. The repo is a scaffold, not a validated result. It contains an agent contract, templates for the memory file and the handoff brief, a consulting-workflow map with sources, a case study, and an evaluation rubric (repeated-context events, handoff brief length, decision closure time, specialist rework loops, and so on). The next step is a one-week field trial on a live project before claiming anything. The thing I would most like pushback on is the memory boundary. The current rule is that specialists do not see the full project history, only the handoff brief plus the files they need. I am not sure where that breaks. My suspicion is that on tasks where the specialist needs to know why a previous option was rejected, the brief will quietly grow until it becomes the full memory again. Curious whether anyone has run into that, or solved it differently. submitted by /u/Hot-Leadership-6431 [link] [comments]
View originalYes, Durable offers a free tier. Pricing found: $0, $0, $25/m, $20, $99/m
Durable has an average rating of 4.5 out of 5 stars based on 2 reviews from G2, Capterra, and TrustRadius.
Key features include: Home Services, Health Wellness, Professional Services, Food Events, Pet Auto, Creative Digital, AI image studio, SEO GEO.
Durable is commonly used for: Creating a personal portfolio website in minutes, Building an online store for handmade goods, Launching a service-based business website for freelancers, Setting up a health and wellness blog with integrated booking, Developing a food delivery service platform, Creating a pet care service website with appointment scheduling.
Durable integrates with: Stripe for payment processing, Zapier for workflow automation, Google Analytics for website tracking, Mailchimp for email marketing, Slack for team communication, Calendly for scheduling appointments, Canva for graphic design, QuickBooks for accounting, Shopify for e-commerce capabilities, WordPress for blogging features.
Shawn Wang
Founder at smol.ai
2 mentions

Starting a Business Is About to Get Unfair
Mar 18, 2026
Based on user reviews and social mentions, the most common pain points are: token cost, API bill, API costs.
Based on 64 social mentions analyzed, 5% of sentiment is positive, 94% neutral, and 2% negative.