The Geometry of Benchmarks: A New Path Toward AGI
Przemyslaw Chojecki

TL;DR
This paper introduces a geometric framework for evaluating AI benchmarks as points in a structured space, providing new insights into autonomous progress and self-improvement towards AGI.
Contribution
It develops a novel geometric approach to analyze AI benchmarks, defines an Autonomous AI scale, and introduces a Generator-Verifier-Updater operator for understanding self-improvement.
Findings
Dense benchmark families certify performance across task regions
GVU operator generalizes reinforcement learning and self-play
Progress towards AGI is a flow on benchmark moduli driven by GVU dynamics
Abstract
Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition
