Agentic Test-Time Scaling for WebAgents

Nicholas Lee; Lutfi Eren Erdogan; Chris Joseph John; Surya Krishnapillai; Michael W. Mahoney; Kurt Keutzer; Amir Gholami

arXiv:2602.12276·cs.AI·February 13, 2026

Agentic Test-Time Scaling for WebAgents

Nicholas Lee, Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

PDF

Open Access

TL;DR

This paper introduces CATTS, a dynamic compute allocation method for web agents that uses vote-derived uncertainty to improve performance and efficiency in multi-step tasks.

Contribution

The paper presents CATTS, a novel approach that adaptively allocates compute based on uncertainty, outperforming naive scaling in web agent tasks.

Findings

01

Uniform scaling saturates in long-horizon environments

02

Vote-based uncertainty correlates with success

03

CATTS improves performance and reduces token usage

Abstract

Test-time scaling has become a standard way to improve performance and boost reliability of neural network models. However, its behavior on agentic, multi-step tasks remains less well-understood: small per-step errors can compound over long horizons; and we find that naive policies that uniformly increase sampling show diminishing returns. In this work, we present CATTS, a simple technique for dynamically allocating compute for multi-step agents. We first conduct an empirical study of inference-time scaling for web agents. We find that uniformly increasing per-step compute quickly saturates in long-horizon environments. We then investigate stronger aggregation strategies, including an LLM-based Arbiter that can outperform naive voting, but that can overrule high-consensus decisions. We show that uncertainty statistics derived from the agent's own vote distribution (entropy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI