Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
Neharika Jali, Anupam Nayak, Gauri Joshi

TL;DR
This paper introduces TAB, a method for adaptively allocating reasoning budgets in multi-turn LLM tasks, improving efficiency by saving tokens while maintaining accuracy.
Contribution
It formulates multi-turn reasoning as a sequential compute allocation problem and proposes a learned policy, TAB, to optimize token usage across turns.
Findings
TAB achieves up to 35% token savings with maintained accuracy.
TAB All-SubQ saves up to 40% tokens by considering all sub-questions.
Experiments on mathematical reasoning benchmarks validate the effectiveness of the approach.
Abstract
As LLM reasoning performance plateau, improving inference-time compute efficiency is crucial to mitigate overthinking and long thinking traces even for simple queries. Prior approaches including length regularization, adaptive routing, and difficulty-based budget allocation primarily focus on single-turn settings and fail to address the sequential dependencies inherent in multi-turn reasoning. In this work, we formulate multi-turn reasoning as a sequential compute allocation problem and model it as a multi-objective Markov Decision Process. We propose TAB: Turn-Adaptive Budgets, a budget allocation policy trained via Group Relative Policy Optimization (GRPO) that learns to maximize task accuracy while respecting global per-problem token constraints. Consequently, TAB takes as input the conversation history and learns to adaptively allocate smaller budgets to easier turns and save…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
