Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR

Yash Ingle; Jaival Chauhan; Ankit Yadav; Sudhakar Mishra

arXiv:2605.07137·cs.LG·May 11, 2026

Adaptive Negative Reinforcement for LLM Reasoning:Dynamically Balancing Correction and Diversity in RLVR

Yash Ingle, Jaival Chauhan, Ankit Yadav, Sudhakar Mishra

PDF

TL;DR

This paper introduces adaptive and confidence-weighted negative reinforcement techniques to improve large language model reasoning by dynamically balancing correction and diversity during training.

Contribution

It proposes novel adaptive scheduling and confidence-based penalty weighting methods for negative reinforcement in LLM training, enhancing reasoning performance.

Findings

01

Improved performance on MATH, AIME 2025, and AMC23 datasets.

02

Adaptive and confidence-weighted methods outperform fixed penalty approaches.

03

Formal analysis shows effective token-level update control.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has become a highly effective method for improving the reasoning abilities of Large Language Models (LLMs). Recent research shows that Negative Sample Reinforcement (NSR) -- which focuses on penalizing incorrect steps rather than simply rewarding correct ones -- can match or even exceed the performance of more complex frameworks like PPO and GRPO across the entire Pass@k spectrum. However, current NSR techniques usually apply a fixed penalty throughout the training process and treat every incorrect response with the same weight. To address these limitations, we propose two extensions to the NSR framework: Adaptive Negative Sample Reinforcement. Rather than using a fixed update rule, A-NSR uses time-dependent scheduling functions. In the initial training phases, the system focuses heavily on correcting errors to stabilize the model.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.