CellARC: Measuring Intelligence with Cellular Automata
Miroslav L\v{z}i\v{c}a\v{r}

TL;DR
CellARC introduces a flexible cellular automata-based benchmark for evaluating AI models' abstraction and reasoning abilities, enabling controlled, reproducible studies of rule inference and generalization across diverse model architectures.
Contribution
The paper presents CellARC, a novel synthetic benchmark built from multicolor 1D cellular automata, with extensive datasets and evaluation protocols for analyzing model generalization and rule inference.
Findings
Small transformer models outperform recursive models on the benchmark.
Large GPT-5 High achieves higher accuracy, demonstrating scale benefits.
Ensemble methods combining symbolic and neural models improve performance.
Abstract
We introduce CellARC, a synthetic benchmark for abstraction and reasoning built from multicolor 1D cellular automata (CA). Each episode has five support pairs and one query serialized in 256 tokens, enabling rapid iteration with small models while exposing a controllable task space with explicit knobs for alphabet size k, radius r, rule family, Langton's lambda, query coverage, and cell entropy. We release 95k training episodes plus two 1k test splits (interpolation/extrapolation) and evaluate symbolic, recurrent, convolutional, transformer, recursive, and LLM baselines. CellARC decouples generalization from anthropomorphic priors, supports unlimited difficulty-controlled sampling, and enables reproducible studies of how quickly models infer new rules under tight budgets. Our strongest small-model baseline (a 10M-parameter vanilla transformer) outperforms recent recursive models (TRM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications · Ferroelectric and Negative Capacitance Devices · Machine Learning and Algorithms
