T$^2$: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu

TL;DR
T$^2$ is an adaptive reasoning framework for contextual question answering that dynamically adjusts reasoning depth based on question complexity, improving accuracy and reducing computational costs.
Contribution
It introduces a novel test-time adaptive strategy that leverages similar questions to determine optimal reasoning depth for each query.
Findings
Achieves higher accuracy than baseline methods.
Reduces computational overhead by up to 25.2%.
Effective across seven diverse CQA benchmarks.
Abstract
Recent advances in Large Language Models (LLMs) have demonstrated remarkable performance in Contextual Question Answering (CQA). However, prior approaches typically employ elaborate reasoning strategies regardless of question complexity, leading to low adaptability. Recent efficient test-time scaling methods introduce budget constraints or early stop mechanisms to avoid overthinking for straightforward questions. But they add human bias to the reasoning process and fail to leverage models' inherent reasoning capabilities. To address these limitations, we present T: Think-to-Think, a novel framework that dynamically adapts reasoning depth based on question complexity. T leverages the insight that if an LLM can effectively solve similar questions using specific reasoning strategies, it can apply the same strategy to the original question. This insight enables to adoption of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
