Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Md Tanvirul Alam; Nidhi Rastogi

arXiv:2510.27044·cs.LG·December 2, 2025

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Md Tanvirul Alam, Nidhi Rastogi

PDF

Open Access

TL;DR

This paper examines the limitations of Reinforcement Learning with Verifiable Rewards (RLVR) in fostering genuine mathematical reasoning in large language models, revealing that RLVR often reinforces superficial heuristics rather than true reasoning strategies.

Contribution

It provides a critical analysis of RLVR's effectiveness on combinatorial problems, highlighting its tendency to reinforce shortcuts over genuine reasoning, and emphasizes the need for better benchmarks.

Findings

01

RLVR improves evaluation metrics but often reinforces superficial heuristics.

02

RLVR's ability to foster genuine reasoning is limited across studied problems.

03

Highlights the importance of benchmarks that distinguish true reasoning from shortcuts.

Abstract

Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach for enhancing such capabilities; however, its ability to foster genuine reasoning remains unclear. We investigate RLVR on two combinatorial problems with fully verifiable solutions: \emph{Activity Scheduling} and the \emph{Longest Increasing Subsequence}, using carefully curated datasets with unique optima. Across multiple reward designs, we find that RLVR improves evaluation metrics but often by reinforcing superficial heuristics rather than acquiring new reasoning strategies. These findings highlight the limits of RLVR generalization, emphasizing the importance of benchmarks that disentangle genuine mathematical reasoning from shortcut…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Text Readability and Simplification