Measuring AI agent autonomy: Towards a scalable approach with code   inspection

Peter Cihon; Merlin Stein; Gagan Bansal; Sam Manning; Kevin Xu

arXiv:2502.15212·cs.AI·February 24, 2025

Measuring AI agent autonomy: Towards a scalable approach with code inspection

Peter Cihon, Merlin Stein, Gagan Bansal, Sam Manning, Kevin Xu

PDF

TL;DR

This paper proposes a scalable, code-based method for assessing AI agent autonomy that avoids run-time evaluations, enabling safer and more cost-effective analysis of autonomous capabilities.

Contribution

It introduces a novel code inspection framework to measure AI agent autonomy, focusing on impact and oversight attributes, demonstrated with the AutoGen system.

Findings

01

Code-based assessment reduces evaluation costs and risks.

02

The framework effectively scores autonomy attributes.

03

Application to AutoGen shows practical viability.

Abstract

AI agents are AI systems that can achieve complex goals autonomously. Assessing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations -- observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the orchestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.