Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning
Ho-Lam Chung, Teng-Yun Hsiao, Hsiao-Ying Huang, Chunerh Cho, Jian-Ren Lin, Zhang Ziwei, Yun-Nung Chen

TL;DR
This paper surveys Test-Time Scaling methods for Large Language Models and introduces ADAPT, a diversity-aware prefix tuning technique that significantly enhances reasoning performance with less compute by promoting output diversity.
Contribution
It provides a structured survey of TTS methods and proposes ADAPT, a novel diversity-focused fine-tuning approach that improves reasoning accuracy efficiently.
Findings
ADAPT achieves 80% accuracy on reasoning tasks.
ADAPT uses eight times less compute than strong baselines.
Diversity is crucial for maximizing TTS effectiveness.
Abstract
Test-Time Scaling (TTS) improves the reasoning performance of Large Language Models (LLMs) by allocating additional compute during inference. We conduct a structured survey of TTS methods and categorize them into sampling-based, search-based, and trajectory optimization strategies. We observe that reasoning-optimized models often produce less diverse outputs, which limits TTS effectiveness. To address this, we propose ADAPT (A Diversity Aware Prefix fine-Tuning), a lightweight method that applies prefix tuning with a diversity-focused data strategy. Experiments on mathematical reasoning tasks show that ADAPT reaches 80% accuracy using eight times less compute than strong baselines. Our findings highlight the essential role of generative diversity in maximizing TTS effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttentive Walk-Aggregating Graph Neural Network
