Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong

TL;DR
Elastic Reasoning introduces a scalable framework for chain of thought reasoning that separates thinking and solution phases, enabling models to reason effectively under resource constraints and produce concise, reliable outputs.
Contribution
The paper presents a novel elastic reasoning framework with a budget-aware training strategy, improving reasoning robustness and efficiency under strict inference constraints.
Findings
Performs well on mathematical and programming benchmarks under tight budgets.
Achieves lower training costs compared to baseline methods.
Produces more concise reasoning in unconstrained settings.
Abstract
Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that explicitly separates reasoning into two phases--thinking and solution--with independently allocated budgets. At test time, Elastic Reasoning prioritizes the completeness of solution segments, significantly improving reliability under tight resource constraints. To train models that are robust to truncated thinking, we introduce a lightweight budget-constrained rollout strategy, integrated into GRPO, which teaches the model to reason adaptively when the thinking process is cut short and generalizes…
Peer Reviews
Decision·ICLR 2026 Poster
Token budget is a key issue for reasoning, the method is very simple and sensical, performance are great.
Part 3.2.3 could probably be clarified, in particular the authors should provide a clearer description of the quantities involved and in particular the meaning of the conditioning in the policy, and the differences with a vanilla GRPO procedure.
- Very simple method that solves the problem of truncated solution in long reasoning. - The thinking-solution ablation (4.4.1) is interesting and is good evidence to understand what the proposed training method improves (i.e., generating a solution under an incomplete thinking process).
## Major - Figure 1, 4, 5, 6 are a bit unclear (This may be a minor weakness, but I assigned this as a major weakness for now because it is a crucial experimental setup): - What are the points? Do they correspond to the whole AIME questions across different budgets? - What is the x-axis? Is it the average tokens used? - Could you include the error bars (x- and y-axis)? This is particularly important as there are cases where the Pass@1 and tokens used are not significantly different
- Novel framework for budget-aware reasoning – The proposed Elastic Reasoning introduces a clear separation between thinking and solution phases, enabling fine-grained control over inference cost without sacrificing performance. - Strong empirical efficiency and robustness – The method achieves reduction in token usage while maintaining or even improving accuracy on diverse math and coding benchmarks. - Excellent generalization under unseen budgets – Models trained with a single budget configura
- The method is only tested on strong reasoning models (DeepScaleR, DeepCoder); it’s unclear whether it generalizes to weaker models like Qwen2.5-Math, which lack explicit CoT structure or strong reasoning priors. - The paper shows that most improvement comes from the solution phase, while increasing the thinking budget (e.g., 2K–3K tokens) brings little additional gain. This suggests that the model may not truly improve its reasoning efficiency—instead, it might rely on memorized solutions rath
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling
