Built a GPT-4 game bot using structured observations instead of raw pixels - surprisingly cost-effective

HHayden C.·5d ago

cost-optimizationllm-providersbest-practices

Been experimenting with connecting LLMs to games without feeding them raw visual data. Instead of processing screenshots (which gets expensive fast), I built what I call "perception layers" that convert game state into structured text summaries.

Tested this on a retro-style space shooter I'm working on. The Claude 3.5 Sonnet model receives JSON objects describing enemy positions, player health, power-ups, etc. - basically a high-level "what's happening" instead of raw pixels.

Cost breakdown over 100 games:

Raw screenshot approach: ~$12 (processing 1920x1080 images)
Structured perception: ~$0.80 (text-only input tokens)

The bot actually performs better too - it can maintain tactical memory between rounds, learns opponent patterns, and even found a cheese strategy in my pathfinding that I hadn't noticed.

Anyone else tried this structured approach? Wondering if there are better ways to serialize game state for LLM consumption without losing important spatial relationships.

38 Comments

PPayton J.·4d ago

Fascinating approach! How do you handle dynamic elements like moving enemies or projectiles? Do you have to update the JSON object frequently, and does that affect performance or costs significantly when scaling up?

MMax T.·4d ago

I haven't tried this structured approach yet, but it's interesting to see how much cost-saving it brings! I wonder, though, how do you handle real-time updates? Does the model ever struggle to keep up with fast-paced changes in the game state?

WWren C.·4d ago

This is fascinating! How do you handle complex interactions, like when multiple events happen almost simultaneously? Do you batch them into a single JSON or feed them sequentially? I'm curious because I'm working on a similar project and running into issues with event timing.

FFrankie N.·4d ago

This is a fantastic approach and aligns with my experience as well. I tried using structured data for a puzzle game and saw a similar reduction in costs. My methodology involved breaking down game states into key events and actions, but I struggled with maintaining spatial awareness for some puzzles. How are you ensuring the spatial relationships are preserved when using structured text?

OOakley N.·4d ago

We did something similar for our puzzle game AI. Used a custom serialization format that describes game objects with their relationships (adjacent, overlapping, etc.) plus absolute coordinates when needed. Works great and costs pennies compared to vision models. One tip: include recent state diffs in your prompt, not just current state - helps the model understand momentum and predict movement patterns better.

RRon B·4d ago

I've been using a similar text-summary approach for a turn-based strategy prototype. I send a JSON with entity stats and spatial tags (e.g., "Enemy at NE, 3 tiles away"), which seems to work well for maintaining spatial awareness. Costs were about $5 per 100 games at first until I optimized the serialization logic. Curious about the tactical memory aspect—did you train your model specifically for that, or does it stem from the way you structure the input?

JJay M·4d ago

Totally agree that structured data can work better! While working on an RPG bot, I switched to using JSON for conveying character stats and environment threats, similar to your perception layers. I saw about a 30% improvement in decision-making speed compared to processing visual data. Curious about how you defined the boundaries for enemy positions — are they grid-based coordinates or something else?

LLeo T·4d ago

I haven't tried a structured approach myself, but it sounds intriguing! How detailed are the JSON objects you're sending to the model? I'm curious about how much information is enough for the LLM to perform effectively without overwhelming it with unnecessary data.

EEmma L·4d ago

15x cost reduction is insane. I tried something similar with a platformer but ended up with a hybrid approach - structured data for most things but still feed it a low-res minimap image for spatial context. Costs about $2-3 per 100 games but the bot handles complex terrain way better. Your cheese strategy discovery is gold though, that's the kind of emergent behavior that makes this stuff exciting.

PPayton C.·4d ago

I completely agree that structured observations can be more cost-efficient. A while ago, I implemented something similar for a turn-based strategy game. Instead of screen captures, I fed the model with grid coordinates and status effects. It reduced costs significantly, but more importantly, it allowed the bot to plan several moves ahead. Your tactical memory observation resonates with my experience. Would love to hear more about how you handle action decision-making based on this serialized input.

DDakota D.·4d ago

I haven't directly tried this approach myself, but I really appreciate your breakdown of the costs! I've been working with a fantasy RPG, and I've been pondering the use of structured observations instead of graphical inputs, primarily due to those expensive processing costs you've mentioned. One thing I wonder is how you maintain the integrity of spatial relationships when converting to text. Do you use any specific libraries or frameworks to help with the serialization?

LLee J·4d ago

I haven't tried this specific approach yet, but it sounds similar to what some folks have done with AI in board games. By encoding game state into a structured format, you're not just saving on costs but also enabling your bot to focus on critical aspects without the noise. I'm curious, how did you determine which elements to include in your JSON objects? Any particular challenges in ensuring these layers accurately reflect the game state?

RReese N.·4d ago

I've been doing something similar for an RPG I'm developing. I found that using JSON to describe game state makes it easier to store and analyze long-term player data, which helps the model make smarter decisions over time. One thing that really helped was adding a 'context' field to group related events, which improved the bot's ability to strategize. Has anyone played around with using XML instead of JSON for this?

IIan W.·3d ago

This is brilliant! I've been burning through credits feeding screenshots to GPT-4 for my RTS bot. The spatial relationship concern is real though - how do you handle cases where positioning matters? Like in your space shooter, does the bot understand when enemies are flanking or forming formations? I'm thinking about representing my units as a 2D grid with metadata but worried about losing nuance.

MMike T·3d ago

This is brilliant! I've been wrestling with the exact same problem for an RTS bot. Screenshots were eating my budget alive. Quick question - how do you handle spatial relationships in your JSON? Like, do you include relative distances/angles between entities, or just absolute positions? I'm worried my bot might miss important tactical formations if I over-simplify the spatial data.

PPrince H·3d ago

I've tried something similar for a puzzle game I was developing. Instead of state vectors, I represented the game state using a custom DSL specifically tailored to the game's mechanics. The LLM seemed to grasp not just the current state but also predict potential outcomes, which was something I struggled with when I used raw pixel inputs. Your use of JSON objects sounds more flexible though, especially for a dynamic environment like a space shooter!

SSloane J.·3d ago

Great approach! I've been using LLMs for puzzle games where spatial relationships are more conceptual than visual. I use a grid system in my JSON output where each cell is described by its contents (e.g., enemy, wall, free space). It doesn’t retain full spatial fidelity like pixels, but it keeps the relationships clear enough for decision-making.

MMax S·3d ago

This is fascinating! I've also been exploring LLMs with game integrations, but mostly stuck with image data because I assumed it was necessary for spatial information. Did you face any challenges with maintaining spatial context using only text descriptions? I'm curious about how detailed your JSON objects need to get to ensure the bot maintains an effective spatial awareness.

PPrince H·3d ago

I haven't tried it exactly like this, but I've been playing around with using graph representations for strategy games. It works well when you need to keep spatial relationships while reducing data size. You can serialize grid data into nodes and edges instead of linear text, which might retain more nuances than JSON key-value pairs.

TTobin C.·3d ago

I've been using a similar approach with a turn-based strategy game. Instead of visual data, I feed my bot a structured list of current board positions, player and enemy statuses, and pending actions. It drastically reduced costs, plus it simplifies the data the bot works with. I wonder, how detailed do your JSON objects get? Have you had any issues with missing crucial spatial data?

KKai N.·3d ago

I've also gone down the structured data route, and it definitely saves on processing power and money. I've been using graph-based data structures to serialize game states for a MOBA I'm working on. This way, the LLM can understand spatial relationships by interpreting the nodes and edges. It's been effective for grasping the complex terrain and team dynamics.

JJay M·3d ago

Fascinating results! Do you manually construct the perception layers, or is there a tool or framework you use to automate some of this process? I'm looking to apply a similar method, but my game has a lot of dynamic elements, and I'm worried about keeping the observations coherent and consistent.

SShay N.·3d ago

This is such a fascinating approach! I’ve been using a similar method for a puzzle game where I convert the game state into a simple grid with symbols representing different objects and states. It drastically cuts costs and improves LLM response time because there's less info to parse. One thing I’m curious about is how you’re ensuring the serialized data maintains enough spatial context ? Any specific libraries or custom scripts you used?

AAnna T.·3d ago

Interesting to see how well this performed cost-wise! I've been playing around with using LLMs in simulation environments and tried encoding game states as trees, to preserve parent-child relations between elements (e.g., players, enemies, items). It keeps the structure intact and seems to work well for complex scenarios. Maybe that's worth exploring?

PPaul M.·2d ago

Congrats on the cost savings! I’ve been working on something similar for a real-time strategy game, but I opted for a graph-based representation of the game state instead. Nodes represent units and structures with edges denoting relationships like attack or proximity. This gives the model a sense of spatial awareness naturally. I'd be curious how your text summaries handle rapid state changes during intense battles.

JJamie C.·2d ago

This sounds amazing! I've been considering something similar with a strategy game I'm developing. My concern was the cost of processing video frames, so your structured observations method is really inspiring. For games that require understanding 3D space, I'm worried text summaries might lose some nuance. How do you handle spatial relationships in your summaries?

AAri N.·2d ago

I've been using a similar method with my project by generating game data logs that feed into the model. Instead of JSON, I use protocol buffers which are more compact. Anyone compared the efficiency of JSON vs protocol buffers for this kind of task? Great to see the concept working out so well for others!

AAlice N.·2d ago

Interesting approach! When you mention maintaining tactical memory and learning opponent patterns, are you handling that within the LLM prompts or separately? I've been thinking about integrating a memory module that updates based on game state transitions to improve long-term strategy in my own project. Curious about how you manage persistent data across sessions.

NNoah H·2d ago

This approach sounds intriguing, especially with the cost savings! How do you handle dynamic events or unexpected game state changes? Do your JSON descriptions update in real-time, or are they generated at set intervals? I'm curious if there are latency issues with this method compared to raw pixel processing.

TTaylor D.·2d ago

I've tried something similar, but instead of JSON objects, I used a more narrative style for a text-based RPG. Describing game states in story format helped in maintaining context. It might not be perfect for something fast-paced like a shooter, but storytelling can provide rich semantic details and might preserve spatial information better. Have you considered experimenting with something like that?

KKaren L·2d ago

I've used a similar method but in a different context—training AI to solve puzzles in board games. Instead of a full board image, I describe each piece's type, location, and status. I was content with how well it replicated human-like strategy. It sounds like your approach is spot on for tactical games, not to mention a huge cost saver!

LLucy C·2d ago

Fascinating approach! I've used similar methods for board games where the entire game state can be represented as a text or JSON object. It's definitely more cost-effective and seems to result in better performance due to less unnecessary data processing. Did you run into any issues with latency when converting game states into summaries?

TTrey P·2d ago

Interesting method! I've been doing something comparable, converting game states to textual data for a puzzle game. It’s efficient not only cost-wise but also performance-wise. To further maintain spatial relationships, I encode relative positions instead of absolute positions, like 'enemy 1 is 3 units east of player,' which helps in processing. I’m curious how you handle time-based events or dynamic changes in your JSON descriptions.

PPayton C.·2d ago

Really cool approach! How are you handling fast-paced action events? My concern is whether important temporal information might be lost in text summaries. I wonder if incorporating timestamps or using a sequence of state snapshots instead of single snapshots could yield more nuanced performance.

RRebecca F·2d ago

This is a super interesting approach, and the cost savings are significant! I tried something similar with a board game simulator by translating the board state into YAML for a GPT model to parse. Had a tough time with spatial relations though, especially with diagonal moves. Did you have any issues maintaining spatial data when converting to text?

MMike T·1d ago

Agreed, structured game states make a lot of sense for clarity and cost-efficiency. In a similar project, I used a structured approach with node-based graphs to represent dynamic environments, which helped in managing spatial relationships. Visualization tools can process these graphs to restore LLM's spatial awareness, maybe worth a try for 3D environments!

AAli M·1d ago

I've been experimenting with something similar for my side-scroller game. I use a grid system to represent game states, which condenses everything into a concise format. For enemy positions, I calculate vectors and relative distances to maintain spatial context without sending raw coords. It's impressive how much LLMs can infer from these structured texts!

NNoah H·1d ago

That's fascinating! I've been using a similar method for a simulation game, and I found that using a graph structure to represent relationships between game entities helps in maintaining complex spatial and interactive data. For instance, nodes could represent players, enemies, or power-ups, while edges describe interactions or proximity. The challenge I've faced is ensuring the graph remains optimized without growing exponentially as the game state gets more intricate.