LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng; Haolin Liu; Chengsong Huang; Huiwen Bao; Sheng Zhang; Rui Liu; Runpeng Dai; Ruibo Chen; Chenxi Liu; Tianyi Xiong; Xidong Wu; Hongming Zhang; and Heng Huang

arXiv:2605.08083·cs.CL·May 13, 2026

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, and Heng Huang

PDF

1 Repo

TL;DR

AutoTTS introduces an environment-driven framework for automatic test-time scaling strategy discovery in large language models, improving performance and efficiency over manual heuristics.

Contribution

It shifts TTS strategy design from manual heuristics to automated environment-based discovery, enabling scalable and generalizable solutions.

Findings

01

Discovered strategies improve accuracy-cost tradeoff on benchmarks.

02

Strategies generalize across benchmarks and model scales.

03

Discovery process is efficient, costing only 39.9 and 160 minutes.

Abstract

Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhengkid/AutoTTS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.