Active Inference Agency Formalization, Metrics, and Convergence Assessments
Eduard Kapelko

TL;DR
This paper formalizes the concept of agency in AI as a balance of curiosity and empowerment, providing metrics and convergence analysis to detect and understand mesa-optimization in large-scale models.
Contribution
It introduces a formal definition of agency, a new metric for measuring agency, and analyzes the convergence properties of agentic functions in AI systems.
Findings
Agency functions are smooth and convex, aiding optimization.
Agentic functions are rare but can emerge spontaneously during training.
The proposed metric effectively classifies and detects mesa-optimizers.
Abstract
This paper addresses the critical challenge of mesa-optimization in AI safety by providing a formal definition of agency and a framework for its analysis. Agency is conceptualized as a Continuous Representation of accumulated experience that achieves autopoiesis through a dynamic balance between curiosity (minimizing prediction error to ensure non-computability and novelty) and empowerment (maximizing the control channel's information capacity to ensure subjectivity and goal-directedness). Empirical evidence suggests that this active inference-based model successfully accounts for classical instrumental goals, such as self-preservation and resource acquisition. The analysis demonstrates that the proposed agency function is smooth and convex, possessing favorable properties for optimization. While agentic functions occupy a vanishingly small fraction of the total abstract function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Free Will and Agency · Computability, Logic, AI Algorithms
