ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

ARC Prize Foundation

arXiv:2603.24621·cs.AI·April 20, 2026

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

ARC Prize Foundation

PDF

1 Models 1 Datasets

TL;DR

ARC-AGI-3 is a new interactive benchmark designed to evaluate agentic intelligence in abstract, turn-based environments, emphasizing exploration, goal inference, and planning without relying on language or external knowledge.

Contribution

The paper introduces ARC-AGI-3, a novel benchmark with a unique scoring framework and environment design to assess fluid adaptive efficiency in agentic tasks.

Findings

01

Humans solve 100% of ARC-AGI-3 environments.

02

Current frontier AI systems score below 1% on the benchmark.

03

The benchmark's scoring is grounded in human action baselines.

Abstract

We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action sequences without explicit instructions. Like its predecessors ARC-AGI-1 and 2, ARC-AGI-3 focuses entirely on evaluating fluid adaptive efficiency on novel tasks, while avoiding language and external knowledge. ARC-AGI-3 environments only leverage Core Knowledge priors and are difficulty-calibrated via extensive testing with human test-takers. Our testing shows humans can solve 100% of the environments, in contrast to frontier AI systems which, as of March 2026, score below 1%. In this paper, we present the benchmark design, its efficiency-based scoring framework grounded in human action baselines, and the methodology used to construct,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
guychuk/arc-agi-3-grid-jepa
model

Datasets

Faei/ai-takeover-unlikely
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.