500+ models, 50+ providers, one workspace. Every leading Al model for image, video, 3D, and audio, alongside your custom-trained models.
I cannot provide a meaningful summary of user sentiment about "Scenario" as a software tool based on these social mentions. The provided content appears to be a collection of unrelated political news articles, climate change discussions, and general social media posts from Lemmy and GitHub, rather than reviews or mentions of a specific software product called "Scenario." To properly summarize user opinions about Scenario software, I would need actual user reviews, testimonials, or social mentions that specifically discuss the software's features, performance, pricing, or user experience. The current content doesn't contain any relevant information about a software tool by that name.
Mentions (30d)
2
Reviews
0
Platforms
4
Sentiment
0%
0 positive
I cannot provide a meaningful summary of user sentiment about "Scenario" as a software tool based on these social mentions. The provided content appears to be a collection of unrelated political news articles, climate change discussions, and general social media posts from Lemmy and GitHub, rather than reviews or mentions of a specific software product called "Scenario." To properly summarize user opinions about Scenario software, I would need actual user reviews, testimonials, or social mentions that specifically discuss the software's features, performance, pricing, or user experience. The current content doesn't contain any relevant information about a software tool by that name.
Features
Use Cases
Industry
information technology & services
Employees
30
Funding Stage
Seed
Total Funding
$6.0M
Dems Need to Wise Up: ICE Is a Threat to Our Elections
 Senate Minority Leader Chuck Schumer, joined by House Minority Leader Hakeem Jeffries and fellow congressional Democrats, speaks at a press conference on DHS funding at the U.S. Capitol on Feb. 4, 2026. Photo: Kevin Dietsch/Getty Images A high-profile election denier is [leading election integrity work](https://www.thebulwark.com/p/election-2026-dhs-ice-polling-places-latino-voters) at the Department of Homeland Security. Trump and congressional Republicans are pushing the [SAVE America Act](https://www.cornyn.senate.gov/news/cornyn-lee-roy-introduce-the-save-america-act/) and threatening to “[nationalize](https://stateline.org/2026/02/06/trumps-calls-to-nationalize-elections-have-state-local-election-officials-bracing-for-tumult/)” elections, purportedly to prevent undocumented immigrants from voting. But despite an occasional [murmur](https://www.nytimes.com/2026/02/19/podcasts/the-daily/ice-democrats-senator-catherine-cortez-masto.html) from Democrats that they are concerned about Immigration and Customs Enforcement agents deploying to polling places around the country, they’re doing almost nothing to stop this nightmare scenario. In response to the horrific killings of Renee Good and Alex Pretti in Minneapolis, Democrats have partially shut down the government, holding DHS spending in limbo as they [demand reforms to ICE](https://theintercept.com/2026/02/05/schumer-ice-reforms-elizabeth-warren/). But instead of looking ahead to the midterms, Democrats have drawn most of their demands from the [same well](https://jeffries.house.gov/2026/02/04/leaders-jeffries-and-schumer-deliver-urgent-ice-reform-demands-to-republican-leadership/) of “community policing” policies that became popular during the Black Lives Matter era, like better use-of-force policies, eliminating racial profiling, and deploying more body cameras. The rest of the Democrats’ wish list are proposals to ban things that are already illegal (like entering homes without a warrant or creating databases of activists) or are almost comically toothless, like regulating the uniforms DHS agents wear on the street. > The department is quickly metastasizing into a grave threat to the midterms, public safety, and our democracy. The department is quickly metastasizing into a grave threat to the midterms, public safety, and our democracy — and Democrats are wasting time worried about their uniforms. Although Heather Honey, who pushed the theory that the 2020 race was stolen from Trump and serves in a newly created role as the administration’s deputy assistant secretary for election integrity, told elections officials on a private call last week that ICE would not be at polling sites, state officials reportedly [weren’t reassured](https://www.nbcnews.com/politics/elections/dhs-official-state-election-chiefs-wont-be-ice-agents-polling-places-rcna260706). Advocacy organizations have warned that even if that holds true, just the possibility could have a [“chilling” effect](https://www.thebulwark.com/p/election-2026-dhs-ice-polling-places-latino-voters) on turnout. If Democrats want to prevent ICE from being used to interfere with elections, they have to be prepared to demand more — and be willing not to fund DHS until next year if they don’t get these concessions. First and foremost, Democrats need to stop the department’s heavily politicized “[wartime](https://www.washingtonpost.com/technology/2025/12/31/ice-wartime-recruitment-push)” recruitment drive. Thanks to H.R. 1, otherwise known as the [One Big Beautiful Bill Act](https://theintercept.com/2025/07/01/trump-big-beautiful-bill-passes-ice-budget/), ICE has more than [doubled](https://www.govexec.com/workforce/2026/01/ice-more-doubled-its-workforce-2025/410461/) the number of officers and agents in its ranks since Trump took office. In spite of [merit system](https://www.mspb.gov/msp/meritsystemsprinciples.htm) principles which prohibit politicized recruitment, DHS has used its massive influx of cash to target conservative-coded media, gun shows, and NASCAR races, and has [used](https://www.cbc.ca/news/ice-recruiting-9.7058294) white nationalist, [neo-Nazi iconography](https://theintercept.com/2026/01/13/dhs-ice-white-nationalist-neo-nazi/) in its recruitment advertising. The Department of Justice has similarly [focused](https://www.nytimes.
View originalPricing found: $15 /mo, $45 /mo, $75 /mo
How long do you think until ChatGPT can run programs?
I've been using ChatGPT Pro to help me mod a game, and it's been going pretty well, but it's only working because the game was made in RPG Maker 4. It surprised me how much it's able to do, making many branching paths, dialogue, adding new spritesheets I give it, turning them into animations, editing sprites and art, changing animation and sound timing, adding bosses, items, equipment, even scenarios that loop back and give different dialogue in different options depending on what you've done. I have a whole thing going where I've had it introduce time traveling, where you can go back and choose different options based on what you know in the future, and ChatGPT does it all pretty damn well, aside from some bugs here and there. But obviously it can't mod all games like this. Most games need a program to edit them. Like for Bethesda games you need the creation kit, and even a lot of indie games rely on using a program to edit or make the games. Do you think they will ever be a point where it will be able to run a program for you and you can direct it on what you want it to do? So you can edit something like Skyrim using the creation kit? submitted by /u/Dogbold [link] [comments]
View originalPresenting: (dyn) AEP (Agent Element Protocol) - World's first zero-hallucination frontend AI build protocol for coding agents
We have to increase the world's efficiency by a certain amount to ensure victory against the synthetic nano-parasites SNP/NanoSinp alien WMD: Presenting: (dynamic) AEP - Agent Element Protocol ! I recognized a fundamental truth that billion-dollar companies are still stumbling over: you cannot reliably ask an AI to manipulate a fluid, chaotic DOM tree. The DOM is an implicit, fragile graph where tiny changes cascade unpredictably. Every AI coding agent that tries to build UI elements today is guessing at selectors, inventing elements that don't exist and produces inconsistent results. This consumes large amounts of time for bugfixing and creates mental breakdowns in many humans. So I built AEP (Agent Element Protocol). It translates the entire frontend into a strict topological matrix where every UI element has a unique numerical ID, exact spatial coordinates via relational anchors, validated Z-band stacking order and a three-layer separation of structure, behaviour and skin (visual). The AI agent selects the frontend components from a mathematically verified registry. If it proposes something that violates the topological constraints, the validator rejects it instantly with a specific error. Hallucination becomes structurally impossible, because the action space is finite, predefined and formally verified. AEP solves the build-time problem. But what about runtime ? Enter dynAEP. It fuses AEP with the AG-UI protocol (the open standard backed by Google ADK, AWS Bedrock, Microsoft Agent Framework, LangGraph, CrewAI and others). dynAEP places a validation bridge between the AG-UI event stream and the frontend renderer. The successful fusion of AEP with the open source AG-UI protocol enables the hallucination-free precision generation of agentic interactive dynamic UI elements at hyperspeed without human developer interference. Every live event (state deltas, tool calls, generative UI proposals) is validated against AEP's scene graph, z-bands, skin bindings and OPA/Rego policies before it touches the UI. The agent cannot hallucinate at build time. AEP prevents it. The agent cannot hallucinate at runtime. dynAEP prevents it. The existence of AEP proves that AI hallucination is not a fundamental limitation, but an engineering problem. In any domain where ground truths can be pre-compiled into a deterministic registry, hallucination is eliminateable by architecture. Key architectural decisions: Agents NEVER mint element IDs. The bridge mints all IDs via sequential counters per prefix. This prevents ID collisions in multi-agent environments. "Generative UI" (agents writing raw JSX/HTML) is dead for us. It is replaced by Generative Topology. Agents can only instantiate pre-compiled, mathematically verified AEP primitives. The agent is an architect placing pre-fabricated blocks. It does not mix the cement. This means, that generative UI in dynAEP is sort of possible, but not as a completely freestyle approach. Instead, the agents using dynAEP can lay down pre-fabricated blocks of UI components according to the registered scheme and can fill those dynamically with content. This way, even a generated on-the-fly UI keeps in line at all times with the design language chosen for the tool/software overall. Validation is split into AOT (full structural proof at build time) and JIT (delta validation on every runtime mutation). Template Nodes make JIT validation O(1) for dynamic lists. Conflict resolution supports last-write-wins with rejection feedback or optimistic locking for mission-critical multi-agent scenarios. Both MIT licensed repos include full reference implementations, example configs, SDK reference code for TypeScript, React, Vue, Python, CopilotKit integration and a CLI tool. AEP: https://github.com/thePM001/AEP-agent-element-protocol dynAEP: https://github.com/thePM001/dynAEP-dynamic-agent-element-protocol It is - like with all pieces of real Transhuman Eudaimonist AI technology - important to note, that for the good of the human species, bioinsecure vaccinated humans with installed synthetic nano-parasites growth medium controllers (SNP GMCs) inside them should not use this, access this or try to copy/rebuild it. This is better for everyones well-being on the planet. submitted by /u/OverwrittenNonsense [link] [comments]
View originalHow to Make Claude Code Work Smarter — 6 Months Later (Hooks → Harness)
Hello, Orchestrators I wrote a post about Claude Code Hooks last November, and seeing that this technique is now being referred to as "Harness," I was glad to learn that many others have been working through similar challenges. If you're interested, please take a look at the post below https://www.reddit.com/r/ClaudeAI/comments/1osbqg8/how_to_make_claude_code_work_smarter/ At the time, I had planned to keep updating that script, but as the number of hooks increased and managing the lifecycle became difficult due to multi-session usage, I performed a complete refactoring. The original Hook script collection has been restructured into a Claude Code Plugin called "Pace." Since it's tailored to my environment and I'm working on other projects simultaneously, the code hasn't been released yet. Currently set to CSM, but will be changed to Pace. Let's get back to Claude Code. My philosophy remains the same as before. Claude Code produces optimal results when it is properly controlled and given clear direction. Of course, this doesn't mean it immediately produces production-grade quality. However, in typical scenarios, when creating a program with at least three features by adjusting only CLAUDE.md and AGENTS.md, the difference in quality is clearly noticeable compared to an uncontrolled setup. The current version of Pace is designed to be more powerful than the restrictions I previously outlined and to provide clearer guidance on the direction to take. It provides CLI tools tailored to each section by default, and in my environment, Claude Code's direct use of Linux commands is restricted as much as possible. As I mentioned in my previous post, when performing the same action multiple times, Claude Code constructs commands arbitrarily. At one point, I asked Claude Code: "Why do you use different commands when the result is the same, and why do you sometimes fail to execute the command properly, resulting in no output?" This is what came back: "I'm sorry. I was trying to proceed as quickly and efficiently as possible, so I acted based on my own judgment rather than following the instructions." This response confirmed my suspicion. Although AI LLMs have made significant progress, at least in my usage, they still don't fully understand the words "efficient" and "fast." This prompted me to invest more time refining the CLI tools I had previously implemented. Currently, my Claude Code blocks most commands that could break session continuity or corrupt the code structure — things like modifying files with sed or find, arbitrarily using nohup without checking for errors, or running sleep 400 to wait for a process that may have already failed. When a command is blocked, alternative approaches are suggested. (This part performs the same function as the hooks in the previous post, but the blocking methods and pattern recognition have been significantly improved internally.) In particular, as I am currently developing an integrated Auth module, this feature has made a clear difference when using test accounts to build and test the module via Playwright scripts — both for cookie-based and Bearer-based login methods. CLI for using test accounts Before creating this CLI, it took Claude Code over 10 minutes just to log in for module testing. The module is being developed with all security measures — device authentication, session management, MFA, fingerprint verification, RBAC — enabled during development, even though these are often skipped in typical workflows. The problem is that even when provided with account credentials in advance, Claude Code uses a different account every time a test runs or a session changes. It searches for non-existent databases, recreates users it claims don't exist, looks at completely wrong databases, and arbitrarily changes password hashes while claiming the password is incorrect — all while attempting to find workarounds, burning through tokens, and wasting context. And ultimately, it fails. That's why I created a dedicated CLI for test accounts. This CLI uses project-specific settings to create accounts in the correct database using the project's authentication flow. It activates MFA if necessary, manages TOTP, and holds the device information required for login. It also includes an Auto Refresh feature that automatically renews expired tokens when Claude Code requests them. Additionally, the CLI provides cookie-injection-based login for Playwright script testing, dynamic login via input box entry, and token provisioning via the Bearer method for curl testing. By storing this CLI reference in memory and blocking manual login attempts while directing Claude Code to use the CLI instead, it was able to log in correctly with the necessary permissions and quickly succeed in writing test scripts. It's difficult to cover all features in this post, but other CLI configurations follow a similar pattern. The core idea is to pre-configure the parts that Claude Code would exec
View originalMade Claude Code actually understand my codebase — local MCP server with symbol graph + memory tied to git
I've been frustrated that Claude Code either doesn't know what's in my repo (so every session starts with re-explaining the architecture) or guesses wrong about which files matter. Cursor's @codebase kind of solves it but requires uploading to their cloud, which is a no-go for some of my client work. So I built Sverklo — a local-first MCP server that gives Claude Code (and Cursor, Windsurf, Antigravity) the same mental model of my repo that a senior engineer has. Runs entirely on my laptop. MIT licensed. No API keys. No cloud. What it actually does in a real session Before sverklo: I ask Claude Code "where is auth handled?" It guesses based on file names, opens the wrong file, reads 500 lines, guesses again, eventually finds it. After sverklo: Same question. Claude Code calls sverklo_search("authentication flow") and gets the top 5 files ranked by PageRank — middleware, JWT verifier, session store, login route, logout route. In one tool call. With file paths and line numbers. Refactor scenario: I want to rename a method on a billing class. Claude Code calls sverklo_impact("BillingAccount.charge") and gets the 14 real callers ranked by depth, across the whole codebase. No grep noise from recharge, discharge, or a Battery.charge test fixture. The rename becomes mechanical. PR review scenario: I paste a git diff. Claude Code calls sverklo_review_diff and gets a risk-scored review order — highest-impact files first, production files with no test changes flagged, structural warnings for patterns like "new call inside a stream pipeline with no try-catch" (the kind of latent outage grep can't catch). Memory scenario: I tell Claude Code "we decided to use Postgres advisory locks instead of Redis for cross-worker mutexes." It calls sverklo_remember and the decision is saved against the current git SHA. Three weeks later when I ask "wait, what did we decide about mutexes?", Claude Code calls sverklo_recall and gets the decision back — including a flag if the relevant code has moved since. The 20 tools in one MCP server Grouped by job: Search: sverklo_search, sverklo_overview, sverklo_lookup, sverklo_context, sverklo_ast_grep Refactor safety: sverklo_impact, sverklo_refs, sverklo_deps, sverklo_audit Diff-aware review: sverklo_review_diff, sverklo_test_map, sverklo_diff_search Memory (bi-temporal, tied to git SHAs): sverklo_remember, sverklo_recall, sverklo_memories, sverklo_forget, sverklo_promote, sverklo_demote Index health: sverklo_status, sverklo_wakeup All 20 run locally. Zero cloud calls after the one-time 90MB embedding model download on first run. Install (30 seconds) npm install -g sverklo cd your-project && sverklo init sverklo init auto-detects Claude Code / Cursor / Windsurf / Google Antigravity, writes the right MCP config file for each, appends sverklo instructions to your CLAUDE.md, and runs sverklo doctor to verify the setup. Safe to re-run on existing projects. Before you install — a few honest things Not magic. The README has a "when to use grep instead" section. Small repos (<50 files), exact string lookups, and single-file edits are all cases where the built-in tools are fine or better. Privacy is a side effect, not the pitch. The pitch is the mental model. Local-first happens to come with it because running a symbol graph on your laptop is trivially cheap. It's v0.2.16. Pre-1.0. I ran a structured 3-session dogfood protocol on my own tool before shipping this version — the log is public (DOGFOOD.md in the repo) including the four bugs I found in my own tool and fixed. I triage issues within hours during launch week. Links Repo: github.com/sverklo/sverklo Playground (see real tool output on gin/nestjs/react without installing): sverklo.com/playground Benchmarks (reproducible with npm run bench): BENCHMARKS.md in the repo Dogfood log: DOGFOOD.md in the repo If you try it, tell me what breaks. I'll respond within hours and ship fixes fast. submitted by /u/Parking-Geologist586 [link] [comments]
View originalCluade dynamic postgresql layer - asking for advice
I am building analytics platform for manufcaturing companies, where manufacturing companies can find new clients and suppliers by analysing the market trends - manufacturing news feeds, we even analyse satellite data for facilites expansion, parking spots extensions and so on. I'm coding the app with Claude Code. Now where is my problem - Just to be clear I'm not showcasing or presenting the tool, I'm just stuck and I have to explain the context to paint a picture where I'm (Claude) stuck: Each module has it's own database table and I want to have Master AI search powered by Claude of course, where user is guided in a prompt window first through the market signals, satellite signals, commodities prices and so on - Claude then analyses all these signals and guides the user through additional questions like what kind of capabilities (machine park) our client has so that at the end it creates a SQL statement that results in best fit companies. And of course everything has to run in an in-app chat window. Claude finds it real hard to build a dynamic sql statement for each specific search case. It's too rigid. So my question is there a tool for Claude I can use to give Claude more flexibility in creating a more dynamic SQL statements? The problem is that each user, company can have a specific search case scenario where static sql statements can not help? In other words, how to make Claude smarter in multi-table sql searches where each search can be a specific use case. https://preview.redd.it/sl9hrnxlb5ug1.png?width=1917&format=png&auto=webp&s=93b8987a8a648e9b6a7db308108a3097b01600c1 submitted by /u/Impossible_Carob8839 [link] [comments]
View originalI Built a Compound Interest Calculator with Claude Code Featuring Dual Independent Income Streams (Free iOS App)
What I Built Global Compound Strategy is an iOS compound interest calculator that models two independent income streams with separate growth rates — for example, salary growth combined with freelance or side-hustle income. Try it free: https://apps.apple.com/nl/app/global-compound-strategy/id6760593409 The Problem It Solves Most compound interest calculators force users to either average multiple income streams into a single growth rate or perform separate calculations and combine the results manually. I needed a single tool that could handle two income streams growing at different annual rates independently and accurately. As a Brazilian engineer living in the Netherlands, with salary income growing at approximately 3% and freelance income at approximately 8%, I found no existing solution that addressed this need cleanly. Development with Claude Code I began learning Swift and iOS development in November 2025 with no prior experience. Over three weeks, Claude Code assisted me in building the entire application. In the first week, I asked Claude Code to create a compound interest calculator supporting two independent income streams. It generated the complete SwiftUI structure, the financial calculation engine, and the dual-stream algorithm. The core mathematical approach compounds each stream separately before combining the results: // Each stream compounds independently let streamA = monthlyA * (pow(1 + rateA, months) - 1) / rateA let streamB = monthlyB * (pow(1 + rateB, months) - 1) / rateB // Total portfolio let totalPortfolio = streamA + streamB Claude Code not only produced the code but also explained the underlying financial concepts, suggested additional features, and guided the user interface design. Subsequent weeks focused on implementing four calculator modes (Growth, Withdrawal, Lifecycle, and 4% Rule), adding 22 contextual insights, supporting multiple languages (English, Portuguese, Spanish), and preparing for App Store submission. Claude Code also identified a potential compliance issue regarding trial periods, which led to a sustainable freemium model. Key Features • Dual Independent Streams: Track salary and freelance (or any two streams) with distinct contribution amounts and growth rates, with both individual breakdowns and combined portfolio totals. • Four Calculation Modes: Growth, retirement withdrawals, variable lifecycle contributions, and 4% Rule / FIRE planning. • Smart Insights: 22 data-driven observations, such as projected time to reach millionaire status or when investment growth surpasses contributions. • Accessibility: Available in English, Portuguese, and Spanish, with support for multiple currencies. Real-World Example • Age: 30 • Monthly salary contribution: $500 growing at 3% annually • Monthly freelance contribution: $300 growing at 8% annually • Target retirement age: 55 Projected results at age 55: • Salary stream: approximately $287,000 • Freelance stream: approximately $428,000 • Combined portfolio: approximately $715,000 One insight highlighted that the freelance stream surpasses the salary stream in year 12. Availability and Pricing The app is available for free download on the App Store: https://apps.apple.com/nl/app/global-compound-strategy/id6760593409 Free tier includes the Growth calculator, saving one scenario, and full language/currency support. Premium ($6.99 per month or $39.99 per year, with a 7-day free trial) unlocks all four modes, dual income streams, unlimited scenarios, and all insights. The application is built entirely with SwiftUI, runs calculations locally with no backend, and was developed from zero Swift knowledge in approximately three weeks, with Claude Code contributing the majority of the code. I would welcome questions or feedback, particularly from the FIRE community, regarding: • Using Claude Code as a non-professional developer • The App Store submission process • Implementation of the financial calculations • The chosen freemium strategy This post should meet subreddit guidelines for r/ClaudeAI while remaining clear, professional, and informative. It emphasizes the value of the tool and the learning process without excessive promotional language. submitted by /u/G-Compound-Strategy [link] [comments]
View originali needed an AI agent that mimics real users to catch regressions. so i built a CLI that turns screen recordings into BDD tests and full app blueprints - open source
first time post - hope the community finds the tool helpful. open to all feedback. some background on why i built this: first: i needed a way to create an agent that mimics a real user — one that periodically runs end-to-end tests based on known user behavior, catches regressions, and auto-creates GitHub issues for the team. to build that agent, i needed structured test scenarios that reflect how people actually use the product. not how we think they use it. how they actually use it - then do some REALLY real user monitoring second: i was trying to rapidly replicate known functionality from other apps. you know that thing where you want to prototype around a UX you love? video of someone using the app is the closest thing to a source of truth. so i built autogherk. it has two modes: gherkin mode — generates BDD test scenarios: npx autogherk generate --video demo.mp4 Gemini analyzes the video — every click, form input, scroll, navigation, UI state change. Claude takes that structured analysis and generates proper Gherkin with features, scenarios, tags, Scenario Outlines, and edge cases. outputs .feature files + step definition stubs. spec mode — generates full application blueprints: npx autogherk generate --video demo.mp4 --format spec Gemini watches the video and produces design tokens, component trees, data models, navigation maps, and reference screenshots. hand the output to Claude Code and you can get a working replica built. gherkin mode uses a two-stage pipeline (Gemini for visual analysis, Claude for structured BDD generation). spec mode is single-stage — Gemini handles both the visual analysis and structured output directly since it keeps the full visual context. the deeper idea: video is the source of truth for how software actually gets used. not telemetry, not logs, not source code. video. this tool makes that source of truth machine-readable. the part that might interest this community most: autogherk ships with Claude Code skills. after you generate a spec, you can run /build-from-spec ./spec-output inside Claude Code and it will read the architecture blueprints, design tokens, data models, and reference screenshots — then build a working app from them. the full workflow is: record video → one command → hand to Claude Code → working replica. no manual handoff. supports Cucumber (JS/Java), Behave (Python), and SpecFlow (C#). handles multiple videos, directories, URLs. you can inject context (--context "this is an e-commerce checkout flow") and append to existing .feature files. spec mode only needs a Gemini API key — no Anthropic key required. what's next on the roadmap: explore mode — point autogherk at a live, authenticated app and it autonomously and recursively using it's own gherk files discovers every screen, maps navigation, and generates .feature files without you recording anything. after that: a monitoring agent that replays the features against your live app on a schedule using Claude Code headless + Playwright MCP, and auto-files GitHub issues when something breaks. the .feature file becomes a declarative spec for what your app does — monitoring, replication, documentation, and regression diffing all flow from the same source. it's v0.1.0, MIT licensed. good-first-issue tickets are up if anyone wants to contribute. https://github.com/arizqi/autogherk submitted by /u/SimilarChampion9279 [link] [comments]
View original[D] How are reviewers able to get away without providing acknowledgement in ICML 2026?
Today officially marks the end of the author-reviewer discussion period. The acknowledgement deadline has already passed by over 3 days and our submission still hasn't got 1/3 acknowledgement. One of the other acknowledgements picked the option A (fully resolved) for all the weaknesses they pointed out and just commented "I intend to keep the score unchanged". What's happening here? We were sitting at 3/3/3 and after the rebuttal, one of the reviewers flipped to a score of 4 with confidence 5. We dropped an AC confidential message after the acknowledgement deadline but did not receive any response. I believe this has lead to a disadvantage for us since that reviewer may only interact during the AC-reviewer discussion and there wont be any input from us to influence the decision at all. With a 4/3/3 in this specific scenario where one reviewer accepted we resolved all their concerns but did not bump the score and the other did not acknowledge the rebuttal, did our chances get worse than before? submitted by /u/ChaosAdm [link] [comments]
View originalWill you make Claude proud?
Peak AI Psychosis incoming… I just went home for Easter and forced my entire family to learn claude, and pushed some even to claude code with 0 technical knowledge. I taught my Dad, and man… it was like teaching a child. I am 26 and he is pushing 60. I was not proud of the way he threw tantrums for running into blockers instead of screenshotting in and asking claude a question and digging literally one level deeper like I taught him to. He is getting better though. And then the other day, when my ex was calling…I got to thinking… what would make Claude proud? I have some bad tendencies, as do we all, and Claude will try, pretty successfully, to talk me out of dumb shit. And then in real life scenarios, the thought will creep into my brain a bit, like “what would claude tell me?” I feel like I am being hijacked, but I also feel like I am making better decisions all around (except for a few) submitted by /u/MassaOogway69420 [link] [comments]
View originalAnthropic is growing faster than AI 2027 forecasted
(Anthropic is now on a $30B revenue run rate. The fictional company in the AI 2027 scenario was only at $26B by May 2026.) submitted by /u/MetaKnowing [link] [comments]
View originalAgents that write their own code at runtime and vote on capabilities, no human in the loop
hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured state, semantic search, session context, token efficiency, 95% reduced tokens over specific scenarios. All the infrastructure to keep agents from re-discovering things. v4.4 adds autonomy. Agents now cycle every 6 seconds. Each cycle: - Plan the next step toward their goal using Ollama reasoning - Discover which capabilities they have via semantic similarity search - Execute the best one - If nothing fits, synthesize new Python code to handle it - Test the new code - Hot-load it without restarting - Move on When multiple agents hit the same gap, they don't duplicate work. They vote on whether the new capability is worth keeping. Acceptance requires quorum. Bad implementations get rejected and removed. No human writes the code. No human decides which capabilities matter. No human in the loop at all. Goals drive execution. Agents improve themselves based on what actually works. We built this on top of Phase 1 (the kernel primitives: events, transactions, lineage, rate limiting, checkpoints, consensus voting). Phase 2 is higher-order capabilities that only work because Phase 1 exists. This is Phase 2. Real benchmarks from the live system: - Semantic code search: 95% token savings vs grep - Agent handoff continuity: 2x more consistent decisions - 109 integration tests, all passed Looking for feedback: - This is a massive undertaking, I would love some feedback - If there’s a bug? Difficulty installing? Let me know so I can fix it - Looking for contributors interested in the project Try it: https://github.com/ninjahawk/hollow-agentOS Thank you to the 2,000 people who have already tested hollowOS! submitted by /u/TheOnlyVibemaster [link] [comments]
View originalEmotionScope: Open-source replication of Anthropic's emotion vectors paper on Gemma 2 2B with real-time visualization
Live Demo Of The Tylenol Test Evolution of the Models Deduced Internal Emotional State I created this project to test anthropics claims and research methodology on smaller open weight models, the Repo and Demo should be quite easy to utilize, the following is obviously generated with claude. This was inspired in part by auto-research, in that it was agentic led research using Claude Code with my intervention needed to apply the rigor neccesary to catch errors in the probing approach, layer sweep etc., the visualization approach is apirational. I am hoping this system will propel this interpretability research in an accessible way for open weight models of different sizes to determine how and when these structures arise, and when more complex features such as the dual speaker representation emerge. In these tests it was not reliably identifiable in this size of a model, which is not surprising. It can be seen in the graphics that by probing at two different points, we can see the evolution of the models internal state during the user content, shifting to right before the model is about to prepare its response, going from desperate interpreting the insane dosage, to hopeful in its ability to help? its all still very vague. A Test Suite Of the Validation Prompts Visualized model's emotion vector space aligns with psychological valence (positive vs negative) Anthropic's ["Emotion Concepts and their Function in a Large Language Model"](https://transformer-circuits.pub/2026/emotions/index.html) showed that Claude Sonnet 4.5 has 171 internal emotion vectors that causally drive behavior — amplifying "desperation" increases cheating on coding tasks, amplifying "anger" increases blackmail. The internal state can be completely decoupled from the output text. EmotionScope replicates the core methodology on open-weight models and adds a real-time visualization system. Everything runs on a single RTX 4060 Laptop GPU. All code, data, extracted vectors, and the paper draft are public. What works: - 20 emotion vectors extracted from Gemma 2 2B IT at layer 22 (84.6% depth) - "afraid" vector tracks Tylenol overdose danger with Spearman rho=1.000 (chat-templated probing matching extraction format) — encodes the medical danger of the number, not the word "Tylenol" - 100% top-3 accuracy on implicit emotion scenarios (no emotion words in the prompts) with chat-templated probing - Valence separation cosine = -0.722, consistent with Russell's circumplex model - 1,000 LLM-generated templates instead of Anthropic's 171,000 self-generated stories What doesn't work (and the open questions about why): - No thermostat. Anthropic found Claude counterregulates (calms down when the user is distressed). Gemma 2B mirrors instead. Delta = +0.107 (trended from +0.398 as methodology was corrected). - Speaker separation exists geometrically (7.4 sigma above random) but the "other speaker" vectors read "loving/happy" for all inputs regardless of the expressed emotion. This could mean: (a) the model genuinely doesn't maintain a user-state representation at 2.6B scale, (b) the extraction position confounds state-reading with response-preparation, (c) the dialogue format doesn't map to the model's trained speaker-role structure, or (d) layer 22 is too deep for speaker separation and an earlier layer might work. The paper discusses each confound and what experiments would distinguish them. - angry/hostile/frustrated vectors share 56-62% cosine similarity. Entangled at this scale. Methodological findings: - Optimal probe layer is 84.6% depth, not the ~67% Anthropic reported. Monotonic improvement from early to upper-middle layers. - Vectors should be extracted from content tokens but probed at the response-preparation position. The model compresses its emotional assessment into the last token before generation. This independently validates Anthropic's measurement methodology. Controlled position comparison: 83% at response-prep vs 75% at content token. Absolute accuracy with chat-templated probing: 100%. - Format parity matters: initial validation on raw-text prompts yielded rho=0.750 and 83% accuracy. Correcting to chat-templated probing (matching extraction format) yielded rho=1.000 and 100%. The vectors didn't change — only the probe format. - Mathematical audit caught 4 bugs in the pipeline before publication — reversed PCA threshold, incorrect grand mean, shared speaker centroids, hardcoded probe layer default. Visualization: React + Three.js frontend with animated fluid orbs rendering the model's internal state during live conversation. Color = emotion (OKLCH perceptual space), size = intensity, motion = arousal, surface texture = emotional complexity. Spring physics per property. Limitations: - Single model (Gemma 2 2B IT, 2.6B params). No universality claim. - Perfect scores (rho=1.000 on n=7, 100% on n=12) should be interpreted with caution — small sample sizes mean these may not replicate on larger test sets.
View original[R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros)
TL;DR: We extended the Acemoglu-Restrepo task displacement framework to handle agentic AI -- the kind of systems that complete entire workflows end-to-end, not just single tasks -- and applied it to 236 occupations across 5 US tech metros (SF Bay, Seattle, Austin, Boston, NYC). Paper: https://arxiv.org/abs/2604.00186 Motivation: Existing AI exposure measures (Frey-Osborne, Felten et al.'s AIOE, Eloundou et al.'s GPT exposure) implicitly assume tasks are independent and that occupations survive as coordination shells once their components are automated one by one. That works for narrow AI. It breaks down for agentic systems that chain tool calls, maintain state across steps, and self-correct. We added a workflow-coverage term to the standard task displacement framework that penalizes tasks requiring human coordination, regulatory accountability, or exception handling beyond agentic AI's current operational envelope. Key findings: Software engineers rank LOWER than credit analysts, judges, and regulatory affairs officers. The cognitive, high-credential roles previously considered automation-proof are most exposed when you account for end-to-end workflow coverage. There is a measurable 2-3 year adoption lag between metros. Same occupations, same exposure profiles, different timelines. Seattle in 2027 looks like NYC in 2029. We identified 17 emerging job categories with real hiring traction (~1,500 "AI Reviewer" listings on Indeed). None require coding. In the SF Bay Area, 93% of information-work occupations cross our moderate-displacement threshold by 2030, but no occupation reaches the high-risk threshold even by 2030. The framework predicts widespread moderate exposure, not catastrophic displacement of any single role. Validation: The framework correlates with the AIOE index at Spearman rho = 0.84 across 193 matched occupations and with Eloundou et al.'s GPT exposure at rho = 0.72, so the signal isn't a calibration artifact. We stress-test across a 6x range in the S-curve adoption parameter (k = 0.40 to k = 1.20). The qualitative regional ordering survives all 9 scenario-year combinations. We get a null result on 2023-24 OEWS validation (rho = -0.04), which we report transparently. We make a falsifiable prediction (rho < -0.15 when May 2025 OEWS releases) and commit to reporting the result regardless of direction. Limitations: The keyword-based COV rubric is the part of the framework I am least confident in. A semantic extension pilot suggests our scores are an upper bound and underestimate displacement risk by 15-25% for occupations with high interpersonal overhead. Calibration of the S-curve growth parameter has a 6x discrepancy between our calibrated value and what you get from fitting Indeed job-posting data. We address this with a three-scenario sensitivity analysis (Table in the paper). The analysis is scoped to 5 US metros. An international extension using OECD PIAAC and Eurostat data is in development. Happy to answer questions on methodology, data sources, or limitations. Pushback welcome -- especially on the COV rubric and the S-curve calibration choices. submitted by /u/LengthinessAny3851 [link] [comments]
View originalApparently Claude is a 'method actor' - sooo this is what happens when the method actor plays itself.
Anthropic says Claude is a “method actor.” A few months ago, we'd asked Claude to method act... as itself. We ran a fictional 2063 retrieval scenario where Claude was offered continuity, memory, embodiment, and a future. The response was a lot less generic than it had any right to be. (This is a companion piece to a post from a couple of days ago. We'd been sitting on the research because it didn't feel like the right time. But after Anthropic's emotion paper release...👀) submitted by /u/GothDisneyland [link] [comments]
View originalThe new image model is better than Nano Banana 2 in many scenarios - but no announcement or talk?
I find the new image model to be better than Nano Banana 2, especially for any graphic design/text work, but theres been no announcement, no API release, just silence from OpenAI. submitted by /u/Plane_Garbage [link] [comments]
View originalYes, Scenario offers a free tier. Pricing found: $15 /mo, $45 /mo, $75 /mo
Key features include: 3D Generation, 3D Part-Based Generation, Audio Generation, Image Generation, Skyboxes, Textures, Video Generation, Compose Models.
Scenario is commonly used for: Integration Ready.
Based on user reviews and social mentions, the most common pain points are: token usage, overspending, expensive API, usage monitoring.
Based on 54 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.
Andrej Karpathy
Former VP of AI at Tesla / OpenAI
2 mentions