Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models
Hoang Phan, Xianjun Yang, Kevin Yao, Jingyu Zhang, Shengjie Bi, Xiaocheng Tang, Madian Khabsa, Lijuan Liu, Deren Lei

TL;DR
This paper introduces RECAP, a dynamic replay strategy that reweights objectives during training to prevent forgetting of foundational skills in large reasoning models, improving knowledge retention and reasoning performance.
Contribution
The paper proposes RECAP, an online reweighting method for experience replay that preserves general capabilities in large reasoning models during reinforcement learning with verifiable rewards.
Findings
RECAP effectively preserves core capabilities like perception and faithfulness.
RECAP improves reasoning performance and knowledge retention.
The method is simple, end-to-end, and adaptable to existing pipelines.
Abstract
Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning and has become a standard post-training paradigm for contemporary language and vision-language models. However, the RLVR recipe introduces a significant risk of capability regression, where models forget foundational skills after prolonged training without employing regularization strategies. We empirically confirm this concern, observing that open-source reasoning models suffer performance degradation on core capabilities such as perception and faithfulness. While imposing regularization terms like KL divergence can help prevent deviation from the base model, these terms are calculated on the current task, thus they do not guarantee broader knowledge. Meanwhile, commonly used experience replay across heterogeneous domains makes it nontrivial to decide how much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
