Scalable Chain of Thoughts via Elastic Reasoning

Yuhui Xu; Hanze Dong; Lei Wang; Doyen Sahoo; Junnan Li; Caiming Xiong

arXiv:2505.05315·cs.LG·May 22, 2025

Scalable Chain of Thoughts via Elastic Reasoning

Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong

PDF

Open Access 1 Repo 5 Models 3 Reviews

TL;DR

Elastic Reasoning introduces a scalable framework for chain of thought reasoning that separates thinking and solution phases, enabling models to reason effectively under resource constraints and produce concise, reliable outputs.

Contribution

The paper presents a novel elastic reasoning framework with a budget-aware training strategy, improving reasoning robustness and efficiency under strict inference constraints.

Findings

01

Performs well on mathematical and programming benchmarks under tight budgets.

02

Achieves lower training costs compared to baseline methods.

03

Produces more concise reasoning in unconstrained settings.

Abstract

Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that explicitly separates reasoning into two phases--thinking and solution--with independently allocated budgets. At test time, Elastic Reasoning prioritizes the completeness of solution segments, significantly improving reliability under tight resource constraints. To train models that are robust to truncated thinking, we introduce a lightweight budget-constrained rollout strategy, integrated into GRPO, which teaches the model to reason adaptively when the thinking process is cut short and generalizes…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

Token budget is a key issue for reasoning, the method is very simple and sensical, performance are great.

Weaknesses

Part 3.2.3 could probably be clarified, in particular the authors should provide a clearer description of the quantities involved and in particular the meaning of the conditioning in the policy, and the differences with a vanilla GRPO procedure.

Reviewer 02Rating 2Confidence 3

Strengths

- Very simple method that solves the problem of truncated solution in long reasoning. - The thinking-solution ablation (4.4.1) is interesting and is good evidence to understand what the proposed training method improves (i.e., generating a solution under an incomplete thinking process).

Weaknesses

## Major - Figure 1, 4, 5, 6 are a bit unclear (This may be a minor weakness, but I assigned this as a major weakness for now because it is a crucial experimental setup): - What are the points? Do they correspond to the whole AIME questions across different budgets? - What is the x-axis? Is it the average tokens used? - Could you include the error bars (x- and y-axis)? This is particularly important as there are cases where the Pass@1 and tokens used are not significantly different

Reviewer 03Rating 6Confidence 3

Strengths

- Novel framework for budget-aware reasoning – The proposed Elastic Reasoning introduces a clear separation between thinking and solution phases, enabling fine-grained control over inference cost without sacrificing performance. - Strong empirical efficiency and robustness – The method achieves reduction in token usage while maintaining or even improving accuracy on diverse math and coding benchmarks. - Excellent generalization under unseen budgets – Models trained with a single budget configura

Weaknesses

- The method is only tested on strong reasoning models (DeepScaleR, DeepCoder); it’s unclear whether it generalizes to weaker models like Qwen2.5-Math, which lack explicit CoT structure or strong reasoning priors. - The paper shows that most improvement comes from the solution phase, while increasing the thinking budget (e.g., 2K–3K tokens) brings little additional gain. This suggests that the model may not truly improve its reasoning efficiency—instead, it might rely on memorized solutions rath

Code & Models

Repositories

salesforceairesearch/elastic-reasoning
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling