Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

Zihang Xu; Haozhi Xie; Ziqi Miao; Wuxuan Gong; Chen Qian; Lijun Li

arXiv:2602.22556·cs.LG·February 27, 2026

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

Zihang Xu, Haozhi Xie, Ziqi Miao, Wuxuan Gong, Chen Qian, Lijun Li

PDF

Open Access

TL;DR

This paper introduces a two-stage framework for stable adaptive reasoning in large models, combining hybrid fine-tuning with advantage shaping and length-aware gradient regulation to improve accuracy and efficiency across diverse tasks.

Contribution

It proposes a novel two-stage training framework with CPAS and LAGR techniques to enhance reasoning stability and robustness in large reasoning models.

Findings

01

Achieves up to +3.7 accuracy points on benchmark tasks.

02

Reduces generated tokens by over 40%, improving efficiency.

03

Demonstrates robustness across various problem difficulties and out-of-distribution tasks.

Abstract

Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but they often exhibit overthinking behavior for low-complexity queries. Existing efforts to mitigate this issue are fundamentally limited by unstable accuracy-efficiency trade-offs and poor robustness to heterogeneous reasoning behaviors. To address these challenges, we propose a two-stage framework for stable adaptive thinking in LRMs. The framework first applies Hybrid Fine-Tuning to expose the model to both thinking and no-thinking behaviors, establishing well-conditioned initialization. It then performs adaptive reinforcement learning with Correctness-Preserving Advantage Shaping (CPAS) to avoid suppressing correct long-chain reasoning, and Length-Aware Gradient Regulation (LAGR) to stabilize optimization under severe reasoning-length heterogeneity. Extensive experiments on Qwen2.5-1.5B and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications