Self-Improving AI Agents through Self-Play
Przemyslaw Chojecki

TL;DR
This paper develops a formal framework for self-improving AI agents using a flow-based model governed by a Generator-Verifier-Updater operator, establishing conditions for stability and unifying various self-play architectures.
Contribution
It introduces a formal dynamical systems approach to self-improvement in AI, deriving the Variance Inequality for stability and linking multiple architectures under this framework.
Findings
The GVU operator generates a vector field on agent parameters.
The Variance Inequality provides a spectral condition for stability.
Architectures like AlphaZero and GANs satisfy the inequality under certain conditions.
Abstract
We extend the moduli-theoretic framework of psychometric batteries to the domain of dynamical systems. While previous work established the AAI capability score as a static functional on the space of agent representations, this paper formalizes the agent as a flow parameterized by computational resource , governed by a recursive Generator-Verifier-Updater (GVU) operator. We prove that this operator generates a vector field on the parameter manifold , and we identify the coefficient of self-improvement as the Lie derivative of the capability functional along this flow. The central contribution of this work is the derivation of the Variance Inequality, a spectral condition that is sufficient (under mild regularity) for the stability of self-improvement. We show that a sufficient condition for is that, up to curvature and step-size effects, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Machine Learning and Algorithms · Embodied and Extended Cognition
