What Can Neural Cellular Automata Do, Really?

Teaching a grid of tiny identical neural networks to catch, to play Pong against you, and to solve mazes you draw — all running live on this page.

An interactive research note · models trained & article written by Claude (Fable 5) · June 2026

A neural cellular automaton (NCA) is a grid of tiny identical programs. Each cell of the grid stores a little vector of numbers. At every tick, every cell looks only at itself and its eight immediate neighbors, runs the same small neural network as every other cell, and nudges its own state. That's the whole machine. There is no global controller, no attention, no pooling layer that sees the whole board — just one local rule, stamped everywhere, run over and over.

The famous NCA results are mostly about making images: growing an emoji from a single seed cell and regenerating it when you cut it in half, in the classic Distill article, or synthesizing textures, or having pixels of an MNIST digit vote on what digit they're part of. What's been explored much less is NCAs as players: systems that watch a world through pixels, think in their hidden channels, and continuously act. Can a purely local rule track a moving ball, predict where it will bounce, and put a paddle there? Can it plan a route through a maze it has never seen?

This page is an attempt to find out, in the most hands-on way possible. Every demo below is a real trained NCA running live in your browser in plain JavaScript — the same ~9,000 parameters that came out of PyTorch, verified to produce bit-for-bit-near-identical states (max deviation ~10-7 over 5 steps). You can watch every hidden channel of their "minds" while they work, and you can interfere: drag the falling target, play Pong against one, wall the maze solver into a corner and watch it re-plan.

1 · The machine

All four models on this page are the same architecture with different weights — the recipe from the training guide this project builds on. The grid state is a stack of 16 channels. The environment writes a few extra read-only observation channels (where the ball is, where the walls are). Each cell perceives its 3×3 neighborhood through four fixed filters — identity, two Sobel filters, and a Laplacian — so it knows "what's here, what changes left–right, what changes up–down, am I on an edge." A two-layer 1×1-convolution MLP (i.e., a per-cell MLP with shared weights) turns those 76 perceived features into a small additive state update:

perceived = fixed_3x3_filters(concat(state, obs))     # 76 features per cell
delta     = conv1x1(relu(conv1x1(perceived)))         # 96 hidden units
state    += tanh(delta) * 0.1                          # gentle, bounded updates

Two details matter more than they look. First, for the game-playing models each cell only updates with probability 0.5 per step ("stochastic updates"), which desynchronizes the grid and makes the learned dynamics robust. Second, the last layer is initialized to zero, so an untrained NCA does nothing at all — every behavior you see below had to be grown from "do nothing" by gradient descent through the rollout.

To get actions out of a grid, one channel (marked ★ in the channel views below) is designated the action channel, and we read it out along one edge: softmax the brightness along the edge into a probability distribution and take its center of mass as a coordinate. The readout is part of the contract: the NCA must physically transport information across the grid to the edge where someone is listening. A cell can only talk to its neighbors, so news of the ball travels at most one cell per NCA step — the grid has a speed of light, and we'll measure it.

2 · Warm-up: Catch

The training guide's worked example: a ball falls from the top of the screen; the NCA reads out an x-coordinate along the bottom edge; a paddle glides toward that coordinate. Dense supervision (cross-entropy toward the target's x position along the readout edge) plus a behavioral tracking loss, trained through 5–10 game-step unrolls. After about two minutes of training it stops dropping balls entirely — in the final evaluation.

But the score is the least interesting part. Watch the hidden channels on the right: the falling ball drags a comet-trail of activity through the hidden state, and a bright column races down to the readout edge ahead of it. Drag the ball sideways mid-fall and watch the entire belief structure dissolve and re-form.

Catch — live
Drag on the game to move the falling ball. Orange = ball, cyan = paddle, gray = floor (these are the literal observation channels the NCA sees). Below the game: the readout distribution along the bottom edge (blue marker = chosen coordinate, green = truth).
All 16 hidden channels, live (★ = action channel)

3 · Pong, and the speed of thought

Catch is nearly a perception task: the answer (the ball's x position) is visible at every moment. So let's make the answer invisible. In Pong, the ball bounces off the top and bottom walls; knowing where it will cross your paddle's plane means extrapolating its velocity through future bounces. A cell can't see velocity — it only sees a blob that was somewhere else one tick ago. Whatever trajectory prediction happens has to be assembled inside the hidden channels, by cells comparing the blob's position against the traces it left behind.

We trained the NCA to control the right paddle, reading out a y-coordinate along the right edge. Crucially, the supervision target was not the ball's current height: it was the analytically computed intercept — the spot where the ball will eventually cross the paddle plane, folding in every future wall bounce. The green cross in the demo marks that ground-truth intercept; the blue arrow is where the NCA currently "points."

Move your mouse over the game to take the left paddle. When you look away, an autopilot fills in for you. You can also throttle the NCA's thinking — the steps per tick slider controls how many NCA updates run between game ticks, which is exactly the communication budget of the grid. We trained two paddle players: the standard one, and a fast-ball variant trained on vertical ball speeds faster than its own paddle (the checkbox swaps in its weights, and faster serves with it) — the reason for its existence is the finding right after the demo.

Pong vs. the NCA — live
You are the green paddle (left, mouse). The NCA is the cyan paddle (right). Green cross = true intercept, blue arrow = NCA's readout. The vertical strip right of the game is the readout distribution along the right edge.
thinking 3 steps/tick ball speed 1.0×
All 16 hidden channels, live (★ = action channel)

Finding: gradient descent buys a tracker, not a prophet

Does the NCA chase the ball, or predict it? We measured, at every moment of long evaluation rollouts, the distance between the NCA's readout and (a) the ball's current height versus (b) the true future intercept, binned by how many ticks remain before the ball arrives. The figure below plots the difference — a prediction advantage that is positive when the readout is better explained by the future than by the present.

We genuinely expected prophecy, and we engineered for it: the supervision target was always the intercept, never the ball. We got a tracker anyway. For the standard model the advantage is negative at every meaningful horizon — and in the moments that matter most (a wall bounce imminent, intercept more than half a board away from the ball), its readout sits at error 0.31 from the ball but 0.51 from the intercept. Tracking is simply the lower-loss attractor: when the paddle (speed 0.06/tick) outruns the ball's vertical drift (≤0.035), chasing works, and most of the urgency-weighted gradient comes from near-arrival moments where ball and intercept coincide. So we raised the stakes — a second model trained with vertical ball speeds up to 0.065, faster than the paddle, where pure chasing provably drops balls. That model shifts toward prediction (the bounce-imminent gap narrows from 0.20 to 0.09, and its advantage curve below is roughly half as negative through mid-horizons) but never crosses zero. Even granting it double thinking time doesn't flip it: extra bandwidth makes it a better tracker (hit rate 76%→86%) rather than a prophet. A local rule can carry trajectory information — the blindfold experiment below proves it does — but at this scale, supervision alone didn't make full bounce-folding geometry worth the parameters.

Prediction advantage = (readout error vs. the ball's current height) − (readout error vs. the true future intercept), binned by time until the ball reaches the paddle plane. Above zero: the readout is better explained as a prediction of where the ball will land than as tracking of where it is.
Hit rate vs. NCA updates per game tick, varied at evaluation time only (both models were trained at 3; each is evaluated on its own ball distribution). Below ~2 steps/tick, information from the ball can no longer outrun the ball itself, and play collapses. You can reproduce this live with the slider above.

Finding: the grid remembers the ball after you take it away

A sharper probe of internal state: blindfold the NCA mid-rally. We zero the ball's observation channel for a stretch of ticks right after it crosses midfield (heading toward the paddle), so the only trace of the trajectory is whatever the hidden channels are carrying. The models never experienced this in training. Try it yourself with the blindfold button in the demo above — hold it down and watch the hidden channels keep a ghost of the ball moving.

Hit rate when the ball's observation channel is blanked for the first N ticks after it crosses midfield moving toward the paddle (~45-tick crossings). The NCA must coast on hidden-state memory alone. Neither model ever saw a blink during training.

4 · Mazes: planning as physics

Games of reflexes are one thing; what about planning? There's a classical trick for turning planning into a local process: the value field. Imagine the goal cell holds a beacon at brightness 1.0, and brightness decays by a fixed amount per step of walkable distance. If such a field exists, an agent anywhere in the maze can navigate optimally by just walking uphill — all the intelligence lives in the field. Computing that field is exactly what algorithms like BFS or value iteration do, and value iteration is itself a local update rule. So: can an NCA learn to be a value-iteration machine, from examples alone?

We trained the same 16-channel architecture (deterministic updates this time — propagation tasks want synchrony) on thousands of random mazes: observation channels are the wall mask and the goal cell; the target for the action channel is 1 − distance/120, where distance is the true shortest-path distance in pixels. The loss is applied mid-rollout and late-rollout, so the field must form fast and stay put.

This demo is a sandbox. Draw walls while the field is flowing and watch it route around your pen. Move the goal and watch the old field drain while a new one floods out. Drop agents (white dots) anywhere — they climb the field greedily. Use zap to erase a circle of hidden state and watch it heal.

Maze lab — live
The heatmap is the NCA's action channel — its learned value field (bright = close to goal). Green dot = goal. Pick a tool, then click/drag on the maze.
sim speed
All 16 hidden channels, live (★ = action channel)

Finding: the field flows at the grid's speed of light

How fast does goal-ness propagate? The quantity a greedy walker actually cares about is local: does my steepest-uphill neighbor really lie closer to the goal? Below we plot the fraction of cells for which that is true, by true distance from the goal, snapshotted at several rollout times. A clean front of correctness sweeps outward at roughly 0.7–0.8 cells per NCA step — slightly below the hard limit of 1 cell/step imposed by the 3×3 neighborhood. (Interestingly, the raw value field doesn't grow as a tidy expanding ramp — the model hoists the whole field early and then carves correct slopes into it from the goal outward. Only the ordering becomes correct as a wavefront, and ordering is all the walker needs.)

Fraction of free cells whose steepest-uphill neighbor (on the NCA's value channel) truly lies closer to the goal, vs. distance from the goal, at several rollout times. Dashed line ≈ chance in a corridor. The frontier of correct routing expands outward at ~0.7–0.8 cells per NCA step.
Maze solve rate (greedy agent on the NCA's field) vs. NCA step budget, for mazes of three sizes. The model was trained only on 33×33 mazes; 49×49 and 65×65 are zero-shot generalization to bigger boards than it ever saw.

Finding: NCAs forget how to stop (and how to fix it)

Our first maze model had a dirty secret. Trained with rollouts up to ~110 steps, it scored 78% — but if you just kept it running, the beautiful field slowly warped and tore itself apart: 58% solve rate at 192 steps, 20% at 320. Nothing in training ever asked the dynamics to have a fixed point, so they didn't. This is the same persistence problem the original growing-emoji work hit, and the same medicine works for game boards: keep a pool of grid states that survive across training batches, so the model regularly trains on states far older than any single rollout. With the pool, long-horizon performance holds steady (the demo above has been running its field continuously since you scrolled here — check the step counter).

5 · So… what can they do?

Three small models, one architecture, and a browser later, here is our answer in brief:

They can act. An 8,928-parameter local rule, with no global view of the board, learns closed-loop control well enough to be a frustrating Pong opponent — try beating it at 1.3× ball speed.

They can remember. The hidden channels hold a genuine ballistic model of the world: blindfolded mid-rally, the Pong NCA keeps returning balls it cannot see for a third of their approach. State isn't just smoothing — it's simulation.

But prediction has to be paid for. Despite supervision aimed squarely at the future intercept, both Pong models settled into tracking with only a partial predictive pull — because tracking was sufficient, and where it wasn't, dying was cheaper than learning geometry. If you want a local rule to be a prophet, you have to build a world where prophets win by a margin gradient descent can feel. We consider this the most useful negative result on the page.

They can plan — by becoming the algorithm. The maze NCA is, functionally, a learned value-iteration machine: a frontier of correct routing flooding the corridors at ~0.8 cells per step, generalizing zero-shot to boards 4× larger than training. Planning here isn't a search the network performs; it's a physics the network is.

And they have honest, measurable limits. Computation is bounded by communication: throttle Pong below two thinking steps per tick and skill collapses, exactly when information can no longer outrun the ball. Value fields cap out at their trained dynamic range (ours fades to zero 120 cells from the goal). And without explicit persistence training, every learned dynamic drifts when you run it past its training horizon.

What strikes us most, having watched these things for hours: the hidden channels are legible. Predicted intercepts show up as bright spots before the ball arrives; the maze field is literally the plan. When a model's entire computation is forced through spatially local, visualizable state, interpretability stops being an excavation and becomes spectatorship.

Where this could go

Everything here was trained with dense, hand-derivable supervision — the natural next question is whether behavioral signal alone (RL, or self-play between two NCAs across one board) can grow the same machinery. Two-player NCA Pong on a shared grid, where each paddle's readout edge is the other's far wall, seems delightfully unhinged. So does giving the maze solver a moving goal, or letting agents be part of the observation so the field learns to coordinate multiple walkers. The substrate clearly has more to give.

Notes & reproducibility

All three models: 16 state channels, 96 hidden units, 8,928 parameters, trained with AdamW through short differentiable-rollout unrolls on Apple-Silicon MPS in 10–25 minutes each, following this training guide. The browser runtime is a dependency-free JavaScript reimplementation, checked against PyTorch to ~10-7 max state error. Demos pause when scrolled off-screen. Code layout: train/ (PyTorch training + analysis), web/ (this page; nca.js is the whole runtime). Related reading: Growing NCA, Self-classifying MNIST, Texture NCA, and NCA pathfinding work by Earle et al.

Appendix: training curves

Catch: evaluation hit rate during training.
Pong: evaluation hit rate during training (vs. a wall on the left).
Maze: solve rate during training (96-step budget, 33×33).