Measuring AI agent autonomy: Towards a scalable approach with code inspection
Peter Cihon, Merlin Stein, Gagan Bansal, Sam Manning, Kevin Xu

TL;DR
This paper proposes a scalable, code-based method for assessing AI agent autonomy that avoids run-time evaluations, enabling safer and more cost-effective analysis of autonomous capabilities.
Contribution
It introduces a novel code inspection framework to measure AI agent autonomy, focusing on impact and oversight attributes, demonstrated with the AutoGen system.
Findings
Code-based assessment reduces evaluation costs and risks.
The framework effectively scores autonomy attributes.
Application to AutoGen shows practical viability.
Abstract
AI agents are AI systems that can achieve complex goals autonomously. Assessing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations -- observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the orchestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
