On the Plasticity and Stability for Post-Training Large Language Models

Wenwen Qiang; Ziyin Gu; Jiahuan Zhou; Jie Hu; Jingyao Wang; Changwen Zheng; Hui Xiong

arXiv:2602.06453·cs.LG·February 9, 2026

On the Plasticity and Stability for Post-Training Large Language Models

Wenwen Qiang, Ziyin Gu, Jiahuan Zhou, Jie Hu, Jingyao Wang, Changwen Zheng, Hui Xiong

PDF

Open Access

TL;DR

This paper introduces a Bayesian framework called Probabilistic Conflict Resolution (PCR) to improve training stability and reasoning performance in large language models by effectively managing gradient conflicts.

Contribution

The paper proposes PCR, a stochastic gradient conflict resolution method that outperforms deterministic approaches in training large language models.

Findings

01

PCR smooths training trajectories

02

PCR achieves superior reasoning task performance

03

PCR effectively manages gradient conflicts

Abstract

Training stability remains a critical bottleneck for Group Relative Policy Optimization (GRPO), often manifesting as a trade-off between reasoning plasticity and general capability retention. We identify a root cause as the geometric conflict between plasticity and stability gradients, which leads to destructive interference. Crucially, we argue that deterministic projection methods are suboptimal for GRPO as they overlook the intrinsic stochasticity of group-based gradient estimates. To address this, we propose Probabilistic Conflict Resolution (PCR), a Bayesian framework that models gradients as random variables. PCR dynamically arbitrates conflicts via an uncertainty-aware ``soft projection'' mechanism, optimizing the signal-to-noise ratio. Extensive experiments demonstrate that PCR significantly smooths the training trajectory and achieves superior performance in various reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)