ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Zihan Lin; Xiaohan Wang; Jie Cao; Jiajun Chai; Li Wang; Xiaodong Lu; Wei Lin; Ran He; Guojun Yin

arXiv:2605.00380·cs.LG·May 11, 2026

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Zihan Lin, Xiaohan Wang, Jie Cao, Jiajun Chai, Li Wang, Xiaodong Lu, Wei Lin, Ran He, Guojun Yin

PDF

1 Repo

TL;DR

ResRL introduces a novel reinforcement learning method that enhances LLM reasoning by decoupling semantic distributions of positive and negative responses, leading to improved reasoning and diversity.

Contribution

It proposes negative sample projection Residual Reinforcement Learning (ResRL), a new approach that improves reasoning in LLMs while maintaining response diversity.

Findings

01

ResRL outperforms strong baselines across twelve benchmarks.

02

ResRL surpasses NSR on mathematical reasoning by 9.4% in Avg@16.

03

ResRL effectively balances reasoning ability and diversity in LLM outputs.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

1229095296/ResRL.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.