Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

Zhonghang Yuan; Zhefan Wang; Fang Hu; Zihong Chen; Jinzhe Li; Gang Li; Jie Ying; Huanjun Kong; Songyang Zhang; Nanqing Dong

arXiv:2605.18261·cs.CL·May 19, 2026

Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

Zhonghang Yuan, Zhefan Wang, Fang Hu, Zihong Chen, Jinzhe Li, Gang Li, Jie Ying, Huanjun Kong, Songyang Zhang, Nanqing Dong

PDF

1 Repo

TL;DR

This paper introduces K2V, a framework that extends RLVR to knowledge-intensive domains by synthesizing verifiable data and verifying reasoning, improving LLM reasoning without harming general capabilities.

Contribution

K2V is a novel framework that combines automated data synthesis and reasoning verification for RLVR in knowledge-intensive domains.

Findings

01

K2V improves LLM reasoning in knowledge-intensive tasks.

02

Automated data synthesis enhances training data quality.

03

Verification of reasoning processes leads to better model performance.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs) in domains such as mathematics and coding. However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. Furthermore, current RLVR focuses solely on the correctness of final answers, leading to the limitations of flawed reasoning and sparse reward signals. In this work, we propose Knowledge-to-Verification (K2V), a framework that extends RLVR to knowledge-intensive domains through automated verifiable data synthesis, while enabling verification of the LLM's reasoning process. Extensive experiments demonstrate that K2V enhances the reasoning of LLM in knowledge-intensive domains without significantly compromising the model's general capabilities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SeedScientist/K2V
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.