Is there a half-life for the success rates of AI agents?

Toby Ord

arXiv:2505.05115·cs.AI·May 9, 2025

Is there a half-life for the success rates of AI agents?

Toby Ord

PDF

Open Access

TL;DR

This paper demonstrates that AI agent success rates on longer tasks decline exponentially, characterized by a half-life, and presents a simple mathematical model explaining this phenomenon based on failure rates per minute.

Contribution

It introduces a straightforward exponential decay model for AI performance on extended tasks, linking failure rates to task length and providing a new way to characterize agent robustness.

Findings

01

Success rates decline exponentially with task length.

02

Each agent has a specific half-life indicating performance decay.

03

Model fits well with empirical data from research-engineering tasks.

Abstract

Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model -- a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks -- that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing