A neural cellular automaton (NCA) is a grid of tiny identical programs. Each cell of the grid stores a little vector of numbers. At every tick, every cell looks only at itself and its eight immediate neighbors, runs the same small neural network as every other cell, and nudges its own state. That's the whole machine. There is no global controller, no attention, no pooling layer that sees the whole board — just one local rule, stamped everywhere, run over and over.
The famous NCA results are mostly about making images: growing an emoji from a single seed cell and regenerating it when you cut it in half, in the classic Distill article, or synthesizing textures, or having pixels of an MNIST digit vote on what digit they're part of. What's been explored much less is NCAs as players: systems that watch a world through pixels, think in their hidden channels, and continuously act. Can a purely local rule track a moving ball, predict where it will bounce, and put a paddle there? Can it plan a route through a maze it has never seen?
This page is an attempt to find out, in the most hands-on way possible. Every demo below is a real trained NCA running live in your browser in plain JavaScript — the same ~9,000 parameters that came out of PyTorch, verified to produce bit-for-bit-near-identical states (max deviation ~10-7 over 5 steps). You can watch every hidden channel of their "minds" while they work, and you can interfere: drag the falling target, play Pong against one, wall the maze solver into a corner and watch it re-plan.
1 · The machine
All four models on this page are the same architecture with different weights — the recipe from the training guide this project builds on. The grid state is a stack of 16 channels. The environment writes a few extra read-only observation channels (where the ball is, where the walls are). Each cell perceives its 3×3 neighborhood through four fixed filters — identity, two Sobel filters, and a Laplacian — so it knows "what's here, what changes left–right, what changes up–down, am I on an edge." A two-layer 1×1-convolution MLP (i.e., a per-cell MLP with shared weights) turns those 76 perceived features into a small additive state update:
perceived = fixed_3x3_filters(concat(state, obs)) # 76 features per cell delta = conv1x1(relu(conv1x1(perceived))) # 96 hidden units state += tanh(delta) * 0.1 # gentle, bounded updates
Two details matter more than they look. First, for the game-playing models each cell only updates with probability 0.5 per step ("stochastic updates"), which desynchronizes the grid and makes the learned dynamics robust. Second, the last layer is initialized to zero, so an untrained NCA does nothing at all — every behavior you see below had to be grown from "do nothing" by gradient descent through the rollout.
To get actions out of a grid, one channel (marked ★ in the channel views below) is designated the action channel, and we read it out along one edge: softmax the brightness along the edge into a probability distribution and take its center of mass as a coordinate. The readout is part of the contract: the NCA must physically transport information across the grid to the edge where someone is listening. A cell can only talk to its neighbors, so news of the ball travels at most one cell per NCA step — the grid has a speed of light, and we'll measure it.
2 · Warm-up: Catch
The training guide's worked example: a ball falls from the top of the screen; the NCA reads out an x-coordinate along the bottom edge; a paddle glides toward that coordinate. Dense supervision (cross-entropy toward the target's x position along the readout edge) plus a behavioral tracking loss, trained through 5–10 game-step unrolls. After about two minutes of training it stops dropping balls entirely — in the final evaluation.
But the score is the least interesting part. Watch the hidden channels on the right: the falling ball drags a comet-trail of activity through the hidden state, and a bright column races down to the readout edge ahead of it. Drag the ball sideways mid-fall and watch the entire belief structure dissolve and re-form.
3 · Pong, and the speed of thought
Catch is nearly a perception task: the answer (the ball's x position) is visible at every moment. So let's make the answer invisible. In Pong, the ball bounces off the top and bottom walls; knowing where it will cross your paddle's plane means extrapolating its velocity through future bounces. A cell can't see velocity — it only sees a blob that was somewhere else one tick ago. Whatever trajectory prediction happens has to be assembled inside the hidden channels, by cells comparing the blob's position against the traces it left behind.
We trained the NCA to control the right paddle, reading out a y-coordinate along the right edge. Crucially, the supervision target was not the ball's current height: it was the analytically computed intercept — the spot where the ball will eventually cross the paddle plane, folding in every future wall bounce. The green cross in the demo marks that ground-truth intercept; the blue arrow is where the NCA currently "points."
Move your mouse over the game to take the left paddle. When you look away, an autopilot fills in for you. You can also throttle the NCA's thinking — the steps per tick slider controls how many NCA updates run between game ticks, which is exactly the communication budget of the grid. We trained two paddle players: the standard one, and a fast-ball variant trained on vertical ball speeds faster than its own paddle (the checkbox swaps in its weights, and faster serves with it) — the reason for its existence is the finding right after the demo.
Finding: gradient descent buys a tracker, not a prophet
Does the NCA chase the ball, or predict it? We measured, at every moment of long evaluation rollouts, the distance between the NCA's readout and (a) the ball's current height versus (b) the true future intercept, binned by how many ticks remain before the ball arrives. The figure below plots the difference — a prediction advantage that is positive when the readout is better explained by the future than by the present.
We genuinely expected prophecy, and we engineered for it: the supervision target was always the intercept, never the ball. We got a tracker anyway. For the standard model the advantage is negative at every meaningful horizon — and in the moments that matter most (a wall bounce imminent, intercept more than half a board away from the ball), its readout sits at error 0.31 from the ball but 0.51 from the intercept. Tracking is simply the lower-loss attractor: when the paddle (speed 0.06/tick) outruns the ball's vertical drift (≤0.035), chasing works, and most of the urgency-weighted gradient comes from near-arrival moments where ball and intercept coincide. So we raised the stakes — a second model trained with vertical ball speeds up to 0.065, faster than the paddle, where pure chasing provably drops balls. That model shifts toward prediction (the bounce-imminent gap narrows from 0.20 to 0.09, and its advantage curve below is roughly half as negative through mid-horizons) but never crosses zero. Even granting it double thinking time doesn't flip it: extra bandwidth makes it a better tracker (hit rate 76%→86%) rather than a prophet. A local rule can carry trajectory information — the blindfold experiment below proves it does — but at this scale, supervision alone didn't make full bounce-folding geometry worth the parameters.
Finding: the grid remembers the ball after you take it away
A sharper probe of internal state: blindfold the NCA mid-rally. We zero the ball's observation channel for a stretch of ticks right after it crosses midfield (heading toward the paddle), so the only trace of the trajectory is whatever the hidden channels are carrying. The models never experienced this in training. Try it yourself with the blindfold button in the demo above — hold it down and watch the hidden channels keep a ghost of the ball moving.
The grid does carry the ball. Blind for 8 ticks — nearly a fifth of the approach — the standard model still returns 91% of balls (vs. 95% fully sighted); blind for 16 ticks it holds 74%, and only at 24 ticks (over half the approach) does it collapse to 35%. The fast-ball model degrades along the same gentle slope. Hold the blindfold button and watch the hidden channels: the comet-wake keeps gliding at the ball's learned speed with nothing feeding it. So the ingredients of prediction — a persistent, moving internal estimate of the ball — demonstrably exist; the previous finding says the readout just prefers to point that estimate at the ball rather than fold it forward through future bounces.
4 · Mazes: planning as physics
Games of reflexes are one thing; what about planning? There's a classical trick for turning planning into a local process: the value field. Imagine the goal cell holds a beacon at brightness 1.0, and brightness decays by a fixed amount per step of walkable distance. If such a field exists, an agent anywhere in the maze can navigate optimally by just walking uphill — all the intelligence lives in the field. Computing that field is exactly what algorithms like BFS or value iteration do, and value iteration is itself a local update rule. So: can an NCA learn to be a value-iteration machine, from examples alone?
We trained the same 16-channel architecture (deterministic updates this time — propagation tasks want synchrony) on thousands of random mazes: observation channels are the wall mask and the goal cell; the target for the action channel is 1 − distance/120, where distance is the true shortest-path distance in pixels. The loss is applied mid-rollout and late-rollout, so the field must form fast and stay put.
This demo is a sandbox. Draw walls while the field is flowing and watch it route around your pen. Move the goal and watch the old field drain while a new one floods out. Drop agents (white dots) anywhere — they climb the field greedily. Use zap to erase a circle of hidden state and watch it heal.
Finding: the field flows at the grid's speed of light
How fast does goal-ness propagate? The quantity a greedy walker actually cares about is local: does my steepest-uphill neighbor really lie closer to the goal? Below we plot the fraction of cells for which that is true, by true distance from the goal, snapshotted at several rollout times. A clean front of correctness sweeps outward at roughly 0.7–0.8 cells per NCA step — slightly below the hard limit of 1 cell/step imposed by the 3×3 neighborhood. (Interestingly, the raw value field doesn't grow as a tidy expanding ramp — the model hoists the whole field early and then carves correct slopes into it from the goal outward. Only the ordering becomes correct as a wavefront, and ordering is all the walker needs.)
Finding: NCAs forget how to stop (and how to fix it)
Our first maze model had a dirty secret. Trained with rollouts up to ~110 steps, it scored 78% — but if you just kept it running, the beautiful field slowly warped and tore itself apart: 58% solve rate at 192 steps, 20% at 320. Nothing in training ever asked the dynamics to have a fixed point, so they didn't. This is the same persistence problem the original growing-emoji work hit, and the same medicine works for game boards: keep a pool of grid states that survive across training batches, so the model regularly trains on states far older than any single rollout. With the pool, long-horizon performance holds steady (the demo above has been running its field continuously since you scrolled here — check the step counter).
5 · So… what can they do?
Three small models, one architecture, and a browser later, here is our answer in brief:
They can act. An 8,928-parameter local rule, with no global view of the board, learns closed-loop control well enough to be a frustrating Pong opponent — try beating it at 1.3× ball speed.
They can remember. The hidden channels hold a genuine ballistic model of the world: blindfolded mid-rally, the Pong NCA keeps returning balls it cannot see for a third of their approach. State isn't just smoothing — it's simulation.
But prediction has to be paid for. Despite supervision aimed squarely at the future intercept, both Pong models settled into tracking with only a partial predictive pull — because tracking was sufficient, and where it wasn't, dying was cheaper than learning geometry. If you want a local rule to be a prophet, you have to build a world where prophets win by a margin gradient descent can feel. We consider this the most useful negative result on the page.
They can plan — by becoming the algorithm. The maze NCA is, functionally, a learned value-iteration machine: a frontier of correct routing flooding the corridors at ~0.8 cells per step, generalizing zero-shot to boards 4× larger than training. Planning here isn't a search the network performs; it's a physics the network is.
And they have honest, measurable limits. Computation is bounded by communication: throttle Pong below two thinking steps per tick and skill collapses, exactly when information can no longer outrun the ball. Value fields cap out at their trained dynamic range (ours fades to zero 120 cells from the goal). And without explicit persistence training, every learned dynamic drifts when you run it past its training horizon.
What strikes us most, having watched these things for hours: the hidden channels are legible. Predicted intercepts show up as bright spots before the ball arrives; the maze field is literally the plan. When a model's entire computation is forced through spatially local, visualizable state, interpretability stops being an excavation and becomes spectatorship.
Where this could go
Everything here was trained with dense, hand-derivable supervision — the natural next question is whether behavioral signal alone (RL, or self-play between two NCAs across one board) can grow the same machinery. Two-player NCA Pong on a shared grid, where each paddle's readout edge is the other's far wall, seems delightfully unhinged. So does giving the maze solver a moving goal, or letting agents be part of the observation so the field learns to coordinate multiple walkers. The substrate clearly has more to give.
Notes & reproducibility
All three models: 16 state channels, 96 hidden units, 8,928 parameters, trained with AdamW through short differentiable-rollout unrolls on Apple-Silicon MPS in 10–25 minutes each, following this training guide. The browser runtime is a dependency-free JavaScript reimplementation, checked against PyTorch to ~10-7 max state error. Demos pause when scrolled off-screen. Code layout: train/ (PyTorch training + analysis), web/ (this page; nca.js is the whole runtime). Related reading: Growing NCA, Self-classifying MNIST, Texture NCA, and NCA pathfinding work by Earle et al.