How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, Jiaxin Pei

TL;DR
This study systematically analyzes token consumption in AI agentic coding tasks, revealing high variability, inefficiencies, and challenges in predicting token usage, which impacts the economics of AI deployment.
Contribution
First comprehensive analysis of token consumption patterns in agentic tasks, highlighting variability, inefficiency, and prediction challenges across multiple frontier LLMs.
Findings
Agentic tasks consume 1000x more tokens than reasoning or chat tasks.
Token usage varies up to 30x for the same task and does not correlate with accuracy.
Models like Kimi-K2 and Claude-Sonnet-4.5 use significantly more tokens than GPT-5.
Abstract
The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agents predict their token usage before task execution? In this paper, we present the first systematic study of token consumption patterns in agentic coding tasks. We analyze trajectories from eight frontier LLMs on SWE-bench Verified and evaluate models' ability to predict their own token costs before task execution. We find that: (1) agentic tasks are uniquely expensive, consuming 1000x more tokens than code reasoning and code chat, with input tokens rather than output tokens driving the overall cost; (2) token usage is highly variable and inherently stochastic: runs on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
