Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Wanli Yang; Hongyu Zang; Junwei Zhang; Wenjie Shi; Du Su; Jingang Wang; Xueqi Cheng; Fei Sun

arXiv:2605.07153·cs.CL·May 11, 2026

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Wanli Yang, Hongyu Zang, Junwei Zhang, Wenjie Shi, Du Su, Jingang Wang, Xueqi Cheng, Fei Sun

PDF

TL;DR

This paper investigates how reinforcement learning enhances the recall of factual knowledge in large language models, showing it redistributes existing knowledge rather than acquiring new facts.

Contribution

It demonstrates that RL improves parametric knowledge recall in LLMs by redistributing probability mass, not by learning new information, and identifies the most informative challenging examples.

Findings

01

RL yields ~27% relative gains in factual recall across models and benchmarks.

02

RL primarily redistributes probability mass over existing knowledge, not acquiring new facts.

03

Rare correct answers in training data drive most of the RL gains.

Abstract

Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with no chain-of-thought, training only on binary correctness rewards and applying fact-level train-test deduplication to ensure gains reflect improved recall rather than reasoning or memorization. Across three model families and multiple factual QA benchmarks, RL yields ~27% average relative gains, surpassing both training- and inference-time baselines alike. Mechanistically, RL primarily redistributes probability mass over existing knowledge rather than acquiring new facts, moving correct answers from the low-probability tail into reliable greedy generations. Our data-attribution study reveals that the hardest examples are the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.