ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
ARC Prize Foundation

TL;DR
ARC-AGI-3 is a new interactive benchmark designed to evaluate agentic intelligence in abstract, turn-based environments, emphasizing exploration, goal inference, and planning without relying on language or external knowledge.
Contribution
The paper introduces ARC-AGI-3, a novel benchmark with a unique scoring framework and environment design to assess fluid adaptive efficiency in agentic tasks.
Findings
Humans solve 100% of ARC-AGI-3 environments.
Current frontier AI systems score below 1% on the benchmark.
The benchmark's scoring is grounded in human action baselines.
Abstract
We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action sequences without explicit instructions. Like its predecessors ARC-AGI-1 and 2, ARC-AGI-3 focuses entirely on evaluating fluid adaptive efficiency on novel tasks, while avoiding language and external knowledge. ARC-AGI-3 environments only leverage Core Knowledge priors and are difficulty-calibrated via extensive testing with human test-takers. Our testing shows humans can solve 100% of the environments, in contrast to frontier AI systems which, as of March 2026, score below 1%. In this paper, we present the benchmark design, its efficiency-based scoring framework grounded in human action baselines, and the methodology used to construct,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
