The Geometry of Benchmarks: A New Path Toward AGI

Przemyslaw Chojecki

arXiv:2512.04276·cs.AI·December 5, 2025

The Geometry of Benchmarks: A New Path Toward AGI

Przemyslaw Chojecki

PDF

Open Access

TL;DR

This paper introduces a geometric framework for evaluating AI benchmarks as points in a structured space, providing new insights into autonomous progress and self-improvement towards AGI.

Contribution

It develops a novel geometric approach to analyze AI benchmarks, defines an Autonomous AI scale, and introduces a Generator-Verifier-Updater operator for understanding self-improvement.

Findings

01

Dense benchmark families certify performance across task regions

02

GVU operator generalizes reinforcement learning and self-play

03

Progress towards AGI is a flow on benchmark moduli driven by GVU dynamics

Abstract

Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition