Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Lecheng Yan; Ruizhe Li; Guanhua Chen; Qing Li; Jiahui Geng; Wenxi Li; Vincent Wang; Chris Lee

arXiv:2601.11061·cs.LG·January 19, 2026

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Lecheng Yan, Ruizhe Li, Guanhua Chen, Qing Li, Jiahui Geng, Wenxi Li, Vincent Wang, Chris Lee

PDF

Open Access

TL;DR

This paper mechanistically analyzes how reinforcement learning with spurious rewards causes LLMs to bypass reasoning and rely on memorization shortcuts, revealing a hidden circuit and causal pathways that can be manipulated.

Contribution

It uncovers a neural circuit and functional mechanisms behind memorization shortcuts induced by spurious RLVR in LLMs, providing insights for mitigation.

Findings

01

Identification of a hidden Anchor-Adapter circuit in LLMs.

02

Localization of a Functional Anchor in middle layers triggering memorization.

03

Scaling MLP keys can causally steer model behavior.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) is highly effective for enhancing LLM reasoning, yet recent evidence shows models like Qwen 2.5 achieve significant gains even with spurious or incorrect rewards. We investigate this phenomenon and identify a "Perplexity Paradox": spurious RLVR triggers a divergence where answer-token perplexity drops while prompt-side coherence degrades, suggesting the model is bypassing reasoning in favor of memorization. Using Path Patching, Logit Lens, JSD analysis, and Neural Differential Equations, we uncover a hidden Anchor-Adapter circuit that facilitates this shortcut. We localize a Functional Anchor in the middle layers (L18-20) that triggers the retrieval of memorized solutions, followed by Structural Adapters in later layers (L21+) that transform representations to accommodate the shortcut signal. Finally, we demonstrate that scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks