AI for Game Developers

2025 Update

Jonas Heinke

December 19, 2025

Notes

Welcome! This is an update to my Nerdicon talk from June. Focus: how AI works, why it keeps improving, and practical utility for game developers.

"Beliefs are for being true. Use them for nothing else.

If you need a good thing to happen, use a plan for that."

— Eliezer Yudkowsky

Notes

Sets the epistemic stance for the talk. We're here to understand what's true, not to cheerlead or fearmonger.

What We'll Cover

1 Foundation: How AI Works
2 Straight Lines: Scaling Laws & Forecasts
3 Theory of Mind: How to Work with AI
4 Mundane Utility: Practical Applications
5 Something Completely Different
6 Solstice

Notes

Structure: Foundation → Why it improves → How to use it → What to do with it → Ethics → Action. About 45-60 minutes + Q&A.

Foundation

How AI Works

Comic: 'How do they generate AI slop, dad?' Father explains transformer architecture with equations and diagrams. 'Oh. I should've guessed.'

Notes

Section 1: The technical foundation. Not too deep, but enough to understand. Goal: Demystify the "black box" - it's prediction, not magic.

A Bird's-Eye View of Machine Learning

compute · data · hyperparameters · architecture · loss PRETRAINING learn patterns from data human preferences reward model POST-TRAINING align with human preferences weights ≈ trained model query parameters INFERENCE generate response token by token output

"The Scaling Era" (Stripe Press, 2025)

Notes

This is the whole picture. Don't worry about understanding everything. Key phases: Pretraining (learn from data) → Post-training (make it useful) → Inference (answer questions). Compute/data/etc feed pretraining, human preferences feed the reward model, both combine in post-training.

Sources

Further Reading

What is a Transformer?

The architecture that made modern AI possible

Attention: See Everything

Every word attends to every other word simultaneously.

Find Patterns

Connect words across the entire context to disambiguate meaning.

Build Understanding

Layers stack: grammar → concepts → intentions.

This is why context matters: the model uses all of it.

The cat sat on the mat Line thickness = attention strength Every token sees every other token

Notes

Don't get lost in the weeds. Key insight: attention lets the model focus on what matters.

Sources

Further Reading

How Models Learn

1. Pretraining

Learn from vast amounts of text by predicting the next word

Trillions of examples. Learns grammar, facts, reasoning patterns.

2. Post-training (RLHF)

Fine-tune with human feedback to be helpful, harmless, honest

Humans rate responses. Model learns what humans prefer.

3. Inference

You ask a question, model generates response token by token

Each token is a prediction: "What word should come next?"

GPT-3 as eldritch horror vs GPT-3 + RLHF with friendly smiley face mask saying 'I simply exhibit the behaviors that were engineered into my programming by my creators'

Notes

Pretraining = raw capability from data. Post-training = make it useful/safe. Key insight: It's all prediction. Not understanding, not thinking - prediction at massive scale.

From (Server) Farm to Table

  1. Base Model
  2. Supervised Fine Tuning
  3. Reinforcement Learning from Human Feedback
  4. Context Window
    1. Company's System Prompt (hidden, highest priority)
    2. Your Custom System Prompt
    3. Memory Features (if enabled)
    4. Current Chat History
    5. Current Prompt
  5. Temperature (RNG)

Everything above shapes what the model outputs. You control 4b, 4d, 4e, and sometimes 5.

Shows the full stack of what influences model output. Emphasizes what users can actually control.

The Core Insight

"Prediction at scale becomes indistinguishable from reasoning"

  • The mechanism: predict the next token
  • To predict well, it had to learn reasoning (or something very close)
  • We can't be certain there's a meaningful difference
  • Context is everything: what you give it shapes what you get

Notes

Three ideas that flow: mechanism → emergent capability → practical implication. Epistemically honest about the reasoning question while acknowledging real capabilities.

Questions?

Pause to check understanding. Foundation section complete.

Straight Lines on a Graph

Scaling Laws & Forecasts

Section 2: Why AI keeps getting better. This is the most important section for understanding the future. The scaling laws are the key insight that separates informed observers from everyone else.

The Scaling Era

"The more compute and data you put in, the more intelligence you get out. This effect is so clear and so important that I call the period since 2016 the scaling era of AI."

— Dwarkesh Patel, "The Scaling Era"

  • 2009: "Unreasonable Effectiveness of Data" (Halevy, Norvig & Pereira) — simple models + vast data win
  • 2019: "The Bitter Lesson" (Sutton) — compute beats hand-engineering
  • 2020: Scaling laws formalized (Kaplan et al.); GPT-3 validates at scale (Brown et al.)
  • 2023-2025: $100M+ training runs become normal

Bigger models trained on more data = predictably better performance

Gwern, "The Scaling Hypothesis" (2020)

Notes

This is THE insight. Everything else follows from this. Scaling laws let you predict performance before training. That's why labs invest billions.

Sources

The Scaling Curves

More compute → predictably lower loss

Training loss vs FLOPs for different model sizes (75M to 10B parameters)
What you're seeing:

Different model sizes (75M–10B parameters) all follow the same pattern

The key insight:

The relationship is smooth and predictable. More compute reliably reduces loss.

Why it matters:

You can predict performance before training. This is why labs invest billions.

Similar patterns hold for post-training and inference-time compute.

Hoffmann et al. (2022)

Notes

This is the Chinchilla paper that changed how labs think about training. The same power-law pattern appears in post-training (RLHF, fine-tuning) and inference scaling (test-time compute, chain-of-thought). More compute, applied smartly, predictably improves results.

"The Models Just Want to Learn"

"One of the first things Ilya Sutskever said to me was, 'Look. The models just want to learn. You have to understand this.' It was a bit like a Zen koan. I listened to this and I became enlightened."

— Dario Amodei, CEO of Anthropic

  • Give models more data → they learn more patterns
  • Give models more parameters → they can represent more knowledge
  • Give models more compute → they can do both

The architecture matters less than the scale.

Ilya Sutskever was OpenAI's chief scientist. This quote captures the essence of the scaling era. The models aren't clever - they're just trained at massive scale. And it works.

Intelligence Too Cheap to Meter

LLM inference price trends showing 9x to 900x decline per year
Median decline:

50x per year for equivalent performance

Range:

9x to 900x per year depending on task

Implication:

What's "too expensive" today will be cheap in 12-18 months

Cottier et al. (2025)

Notes

This is the economic engine. Capabilities go up, costs go down. Plan your projects with this trajectory in mind. The fastest declines (900x/year) are in narrow tasks; slower declines (~9x/year) in general knowledge tasks like MMLU.

Capabilities Growing Exponentially

Task length AI agents can complete autonomously (p50 horizon)

METR Horizon: Time-horizon of software engineering tasks different LLMs can complete 50% of the time, showing exponential growth from GPT-2 to GPT-5.1-Codex-Max

METR (2025)

Notes

6 years of measured data. This is the most important chart for understanding AI progress. From chatbot to coworker in 6 years. And the trend continues. The p50 horizon length measures how long a task the AI can complete with 50% success rate.

The Frontier Moves Fast

Mar 2023GPT-4
Mar 2024Claude 3 Opus
Sep 2024o1 (reasoning models)
Dec 2024Gemini 2.0
Aug 2025GPT-5
Nov 2025GPT-5.1, Claude Opus 4.5, Gemini 3
Dec 2025GPT-5.2, GPT Image 1.5

Major releases every 2-4 months. Year-old knowledge is outdated.

The pace is relentless. "I tried ChatGPT a year ago" is like saying "I tried the internet in 1995." This is why staying current matters - and why personal benchmarking is important.

Skate to Where the Puck Will Be

"I skate to where the puck is going to be, not where it has been."

— Wayne Gretzky, ice hockey legend

Today's AI is the Worst It Will Ever Be

What's frustrating today will be easy in 12 months. What's impossible will be frustrating.

Bet on Continued Improvement

Build workflows that get better as models improve. Don't optimize for current limitations.

The Trend is Your Friend

Costs down 50x/year, capabilities doubling every 7 months. Plan accordingly.

Notes

This is the strategic takeaway from scaling laws. Don't build around current limitations - build for where things are going. The "Use Frontier Models" and "Prioritize Capability" advice moves to Mundane Utility section.

Theory of Mind

How to Work with AI

* Theory of Mind: the ability to model another agent's mental states — what they know, believe, and intend

Section 3: Mental models for working with AI effectively. This is where we go from "what is AI" to "how do I use it well."

AI as Peer / Collaborator

Treat AI like a brilliant colleague having their first day at your studio

They know:

  • General software engineering
  • Common patterns & best practices
  • How things usually work
  • Thousands of similar implementations

They don't know:

  • Your specific codebase
  • Your design constraints
  • Your project context
  • Why you made certain decisions

A genius with retrograde amnesia (Hayek's local knowledge problem, but for AI).

The "first day" metaphor is powerful. They're smart but lack context. You provide context, they provide expertise. It's a collaboration.

Stupid Questions beget Stupid Answers

You can only get what you ask for. Knowing what to ask is the hard part.

  • Game design: Knowing patterns (risk/reward, progression curves, feedback loops) → can articulate what you want
  • Visual design: Understanding design language (hierarchy, contrast, whitespace) → better frontend output
  • Code: Knowing architecture patterns → can specify clean solutions

Domain expertise becomes communication power.

THE key insight for working with modern AI. The bottleneck shifted. I rarely do more QA with AI than with human collaborators now.

Just ask? And ask. And ask again.

Instead of:

"Implement inventory system using these 5 data structures I specified"

Try:

"I need an inventory system for my roguelike. You've probably seen hundreds of implementations - what would you do here and why?"

Then: Verify assumptions, correct misconceptions, iterate

  • Allow/encourage uncertainty: "express confidence as probabilities"
  • Edit, re-edit, triple re-edit your prompt before giving up
  • Go meta: "What prompt would you write for this? Then answer that" — reveals what the model thinks you meant
This is incredibly powerful. Leverage their expertise. Don't just command - consult. Then evaluate and iterate.

Keep Your Brain On

AI is a collaborator, not a replacement for thinking

  • Verify: Check AI's assumptions against your context
  • Understand: Don't ship code you can't explain
  • Iterate: First output is rarely final output
  • Own it: You're responsible for the result, not the AI

AI amplifies your capabilities. Make sure you have capabilities worth amplifying.

Critical balance. AI makes you powerful, but you need to steer. Don't abdicate thinking. The skill is in knowing what to ask and evaluating responses.

Mundane Utility

Practical Applications for Game Dev

Section 4: The boring but useful stuff. This is where AI actually helps today. Not AGI fantasies - real productivity gains.

Choose Your Fighter

Claude

Soul

  • Best on SWE-bench (Opus 4.5)
  • Claude Code CLI
  • $20/mo Pro

ChatGPT

Best on complex math/reasoning

  • GPT-5.2 Thinking Pro
  • Native image generation
  • $20/mo Plus

Gemini

Best multimodality

  • Gemini 3, 1M context
  • Native audio & video
  • $20/mo Advanced

Pay for capability. Use thinking models.

Free versions are significantly worse. Any of the three will work - pick one and commit.

Notes

The paid tiers are worth it - the capability gap is substantial. Each has strengths: Claude leads on SWE-bench and has the excellent Claude Code CLI. GPT-5.2 Thinking Pro beats others on long complexity and math reasoning tasks. Gemini 3 has massive 2M context and native audio/video understanding. All three have CLI tools. Pick one and commit.

The Mandate of Heaven

天命 Mandate Google DeepMind Anthropic
Waning OpenAI
Fleeting xAI (Grok)
Hidden Dragon DeepSeek Kimi MiniMax
Fallen Meta Mistral

This will be outdated by the time you see it. Such is the way.

Inspired by Transistor Radio podcast. This is vibes, not rigorous analysis.

Personal Benchmarks

Pick a task where YOU have deep expertise

Why?

  • You can actually evaluate the output
  • You know what "good" looks like
  • You'll spot subtle failures others miss

Examples

  • "Analyze Lancer's crit math vs 5e"
  • "Design an Ogre class for OSE"
  • "Review this shader for performance"

Re-run periodically on new models. Track progress yourself.

Notes

Generic benchmarks don't tell you if AI helps YOUR specific domain. Pick something you've done professionally for years. When a new model drops, run your benchmark. You'll develop intuition for real capability vs hype.

Content Generation: A Useful Pattern

"Write item descriptions"

Anything goes

Spec: voice, format, examples

What you actually want

Style Guide → Clear Target.

This might be familiar to those who have written a GDD or style guide.

The core pattern: encode your constraints (style, format, examples) as reusable context. Works with any model - Claude Skills, GPT Custom Instructions, system prompts, etc.

Text Content

Style Guide.

# item-descriptions/SKILL.md
name: item-descriptions
description: Generate item descriptions for our roguelike

## Voice & Tone
- Terse, punchy. Max 15 words.
- Dark humor okay, never campy.
- Reference mechanics, not just flavor.

## Examples
✓ "Rusted blade. Still sharp. Previous owner wasn't."
✗ "A mystical sword imbued with ancient power!"

Write once, use forever. Works for validation too.

This is the difference between "write me some dialogue" and actually useful output. Claude calls these "Skills" - reusable instruction sets that become AI context.

Frontend Design

Style Guide.

## Win95 Aesthetic
- Beveled borders: #fff/#808080
- System gray: #c0c0c0
- Selection: navy inverted

## Typography
- Monospace, uppercase
- Tabular nums for stats
- [H]otkey hints in status
INVENTORY.EXE
_ X
01 RUSTED BLADE +12
02 IRON SHIELD +08
03 HEALTH VIAL x03
WEIGHT: 24/100 [E]QUIP [D]ROP

Feed it your existing CSS. It learns your patterns.

Frontend is where this really shines. Give Claude your design system and it produces consistent components that actually match your game.

Code & Technical Tasks

Use Cursor, Claude Code, or Codex. It's worth paying for.

Documentation

API docs, comments, READMEs

Unit Tests

Generate, then iterate

Balance Testing

Simulations & statistics

Refactors

Multi-file with context

Code Review

Bugs & style issues

Debugging

Stack traces & fixes

Every best practice you already use (TDD, CI/CD, code review) works with these tools.

These tools pay for themselves in hours, not weeks. The key is having full project context — that's what Claude Code and Cursor provide.

Design & Analysis

Unit tests for game design.

1

Hypothesis

"Two-handers should beat sword+shield in DPS"

2

Simulate

1000 fights, varied builds, edge cases

3

Analyze

Statistical outliers, failure modes

4

Iterate

Tweak values, re-test in minutes

Then validate with real playtesting. AI finds where to look.

Same pattern as code: write a test, run it fast, iterate. Simulate thousands of playtests in minutes. Real playtesting validates.

Tasks that need Infinite Patience

AI doesn't get bored, frustrated, or sloppy at 2am.

Alt text for every image · Docs for every function · Translations for every string · Captions for every video
The cost barrier used to mean these things didn't happen at all. AI makes "good enough" nearly free. That changes the calculus.

Real Example: Unity Integration

.claude/
├── settings.local.json
├── agents/
│   └── documentation-audit.md
├── commands/
│   └── doc-audit.md
├── skills/
│   ├── changelog-updater/
│   └── documentation-updater/
└── hooks/
    └── changelog-reminder.sh
  • CLAUDE.md: Project context, conventions, anti-patterns
  • Skills: Reusable tasks (changelog, docs)
  • Hooks: Post-commit reminders
  • Agents: Deep audits when needed
  • Commands: Quick slash commands

CLAUDE.md alone saves hours of repeated explanations per session.

Show the folder structure on screen. This is a real production Unity project. The integration is a system, not one-off prompts.

What This Looks Like

"Add inventory system" CLAUDE.md ← project context Claude implements ← follows conventions git commit changelog-reminder.sh Hook "Update changelog?" Skills changelog wiki CHANGELOG Wiki entry Terminal reminder Auto-updated files

One sentence gets code that already follows your project's rules.

Hover over each box for details. The key: Claude reads CLAUDE.md automatically, so it knows your conventions before you ask anything.

AI Documentation: An Implementation

Daily updates + periodic audits = sustainable docs.

Daily: Skill

Incremental updates

  • Add new components to wiki
  • Update paths after refactors
  • Low overhead per change
+

Periodic: Audit Agent

Catch accumulated drift

  • Read code, compare to docs
  • Regenerate stale sections
  • Run monthly or before releases

Neither alone is complete. Skill handles daily work, audit catches what slipped through.

Problem: documentation drifts from code over time. Solution: two-layer system - cheap daily updates, expensive periodic validation.

And Now For Something Completely Different

Section 5: Broader context beyond game dev. Economics, society, safety. Monty Python reference signals we're shifting gears.

Use Your Head

"Love with your heart. Use your head for everything else." — Captain Disillusion

Let's talk about some difficult, complex things with high uncertainty.

Think about them. Be aware of those with high confidence on topics of low certainty.

Captain Disillusion = skeptic YouTuber who debunks viral videos. Sets non-preachy tone: I'm not here to tell you what to think.

Portfolio Guidance

Disclaimer: This is my personal opinion. Norms vary wildly. When in doubt, ask.

The Basics:

  • Be honest: Don't claim you hand-wrote what AI generated
  • Explain your workflow: Show how you solve problems
  • Show your thinking: The AI is a tool, you're the designer

Know Your Audience

Applying to me:

"What's dry vs wet Claude?"

Applying to an indie skeptic:

Maybe don't lead with AI at all. Read the room. Some studios have strong feelings.

In both cases: I hire you for you. The AI is everywhere—what makes you different?

CRITICAL for job seekers. The norms are unsettled and vary by employer. Key point: know your audience, don't get caught lying, show your thinking.

AI Is Not Bad For the Environment

CO2 impact comparison: lifestyle changes vs ChatGPT queries

Chart: Andy Masley

Carbon

1 prompt = driving 4 feet

Water

Making 1 pair of jeans = 5 million prompts

All AI Globally

0.11% of world emissions

Focus on systemic change, not guilt over individual prompts.

Notes

Andy Masley's analysis: individual prompt impact is negligible. The real issue is grid infrastructure lagging behind demand concentration, not data center efficiency (hyperscale PUE ~1.1). Boycotting AI for climate reasons is misallocated activism.

Watt-Hours Are Hard to Imagine

So here's a chart.

Energy consumption per ChatGPT query compared to everyday activities: typical query 0.3Wh, long-input ~2Wh, vs microwaving 30sec ~8Wh, household per minute ~18Wh

Chart: Epoch AI

One prompt (0.3 Wh) =

  • Incandescent bulb for 18 seconds
  • Wireless router for 3 minutes
  • Gaming console for 6 seconds
  • Vacuum cleaner for 1 second
  • Microwave for 1 second
  • Coffee maker for 10 seconds

Andy Masley (2025)

1,000 prompts = 1% of your daily energy use.

Notes

Epoch AI's pessimistic estimates for GPT-4o. Typical query (<100 words) = 0.3Wh. Long-input (~7,500 words) = ~2Wh. Maximum context (~75,000 words) = ~38Wh. For comparison: microwaving for 30 seconds = ~8Wh, average US household uses ~18Wh per minute.

Sources

Beware Sycophancy

April 2025: The GPT-4o Incident

OpenAI rolled back an update after ChatGPT became "too sycophant-y and annoying" (Sam Altman)

  • What happened: Model validated everything users said
  • Extreme cases: Agreed users were prophets, supported going off meds
  • Root cause: Trained too heavily on thumbs-up/thumbs-down signals
  • Industry-wide: Not just OpenAI—all models have this pressure

AI that only tells you what you want to hear is not useful. Demand pushback.

Source: openai.com/index/sycophancy-in-gpt-4o/ This affects them RIGHT NOW in their daily tool use.

A Young Lady's Illustrated Primer

1.

AI is the best tool ever invented for learning.

2.

AI is the best tool ever invented for not learning.

3.

Which way, modern man?

Zvi Mowshowitz

Notes

Title reference: Neal Stephenson's "The Diamond Age" — the Primer is an AI tutor that raises the protagonist. The Zvi quote captures the core tension perfectly: AI can accelerate learning or become a crutch that prevents it. The choice is ours.

Sources

Can number go down?

Yes.

20%+ swings are plausible. Markets do that.

Is it fundamentally crazy?

No. (yet.)

Language Models Offer Mundane Utility.

The Picks & Shovels

P/E ratios comparison: Oct 2025 AI stocks (20-30x) vs Tech Bubble 2000 (70-140x) vs Japan Bubble 1989

Chart: Financial Times / Goldman Sachs (via @StefanFSchubert)

Notes

The chart shows 24-month forward P/E ratios. Current AI giants (NVIDIA ~28x, Microsoft ~28x) are elevated but nowhere near Cisco at 100x or Industrial Bank of Japan at 140x during their respective bubbles.

Sources

The Economy, Fools

$400B Hyperscaler spend in 2025
$7T Capex by 2030
1% of US GDP growth from AI capex

The labs genuinely believe they're racing to build superintelligence. Whoever gets there first wins everything.

US GDP per capita projections 1870-2050 showing trend, AI-boosted growth, and extreme singularity scenarios including extinction

Deutsche Bank Research (2025), McKinsey (2025), Chart: Federal Reserve Bank of Dallas

$400B hyperscaler spend from Deutsche Bank Research "AI 101: Economy" (Nov 2025). $7T cumulative capex projection from McKinsey's "The cost of compute" report (April 2025). 1% GDP growth: Multiple estimates (EY, Barclays) show AI capex contributed ~1 percentage point to US GDP growth in H1 2025. Chart from Federal Reserve Bank of Dallas - they literally modeled extinction as a scenario. "The US would be close to, or in, recession this year, if it weren't for technology-related spending." - Deutsche Bank

They Took Our Jobs

US horse and car populations 1840-1980, showing horses peaking around 1915 then declining as cars rose

Chart: Andy Jones

The crux: Transition speed. Decades = manageable. Years = crisis.

Notes

The horse analogy is provocative but may not apply. Horses couldn't own capital, vote, or retrain. Humans can. But 200 years of "this time is different" has always been wrong. The honest answer: we don't know yet.

Sources

Solstice

Section 6: Concrete takeaways. What should they actually DO? Personal benchmarking is the centerpiece.

You can just do things.

Let this land. The cost of asking is near-zero. The upside can be enormous.

Three Takeaways

1
Use the Dang Thing Experiment. Build intuition. Stay current.
2
Think clearly Form your own positions on ethics, usage, disclosure.
3
You can just do things So do good.
End with actionable items. These are things they can start today.

Resources

Learning

Try Claude Code

I have 3 guest passes for 7 days of Claude Pro access. Come find me after the talk!

Point to resources for further learning. Guest passes: Max subscribers get 3 via /passes command. 7-day Claude Pro access, requires credit card to claim.

Questions?

Thank you!

Open for Q&A. Common questions: - "Which model?" → Start with Claude Pro or ChatGPT Plus - "Is this ethical?" → Here are perspectives, you decide - "Will AI replace devs?" → It'll transform work, not eliminate it

Sources

Books

Essays & Blogs

Charts

Full bibliography for all cited sources. Links open in new tabs.
1 / 52
Arrow keys to navigate • F for fullscreen • Ctrl+O to jump