Agentic Test-Time Scaling for WebAgents
Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

TL;DR
This paper introduces CATTS, a dynamic compute allocation method for web agents that uses vote-derived uncertainty to improve performance and efficiency in multi-step tasks.
Contribution
The paper presents CATTS, a novel approach that adaptively allocates compute based on uncertainty, outperforming naive scaling in web agent tasks.
Findings
Uniform scaling saturates in long-horizon environments
Vote-based uncertainty correlates with success
CATTS improves performance and reduces token usage
Abstract
Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step tasks remains less well-understood: small per-step errors can compound over long horizons; and we find that naive policies that uniformly increase sampling show diminishing returns. In this work, we present CATTS, a simple technique for dynamically allocating compute for multi-step agents. We first conduct an empirical study of inference-time scaling for web agents. We find that uniformly increasing per-step compute quickly saturates in long-horizon environments. We then investigate stronger aggregation strategies, including an LLM-based Arbiter that can outperform naive voting, but that can overrule high-consensus decisions. We show that uncertainty statistics derived from the agent's own vote distribution (entropy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
