Mathematics and Coding are Universal AI Benchmarks
Przemyslaw Chojecki

TL;DR
This paper demonstrates that mathematics and coding serve as universal benchmarks for AI evaluation, with coding being fully universal and mathematics offering spectral universality, facilitating AI self-improvement.
Contribution
It introduces the Mathematics Fiber concept and proves the density of mathematical and coding tasks in the AI benchmark space, highlighting their universal evaluative role.
Findings
Coding tasks are dense in the AI benchmark space.
Mathematics provides spectral universality, not full expressiveness.
Formal proof systems enable stable self-improvement regimes.
Abstract
We study the special role of mathematics and coding inside the moduli space of psychometric batteries for AI agents. Building on the AAI framework and GVU dynamics from previous works, we define the Mathematics Fiber and show that, when paired with formal proof kernels (e.g. Lean, Coq), GVU flows on this fiber admit spectrally stable self-improvement regimes due to oracle-like verification. Our main technical result is a density theorem: under uniform tightness of agent outputs and a Lipschitz AAI functional, the subspace of batteries generated by mathematical theorem-proving and coding tasks is dense in the moduli space of batteries with respect to the evaluation metric. Coding alone is universal in this sense, while pure mathematics is not; its privilege is spectral rather than expressive. We interpret this as evidence that mathematics and coding provide ``universal coordinates'' for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Embodied and Extended Cognition · Explainable Artificial Intelligence (XAI)
