RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution

Kaiyuan Li; Jing-Cheng Pang; Yang Yu

arXiv:2603.20799·cs.CL·March 24, 2026

RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution

Kaiyuan Li, Jing-Cheng Pang, Yang Yu

PDF

Open Access

TL;DR

Reinforcement learning from verifiable rewards enhances reasoning on specific tasks but does not automatically improve general question answering, necessitating explicit training methods like START to foster better thinking and answers.

Contribution

The paper introduces a new evaluation framework for reasoning quality, demonstrates the limited transfer of RLVR to GQA, and proposes START, a simple training method that improves GQA performance.

Findings

01

RLVR improves reasoning on verifiable tasks but not on GQA.

02

Explicit GQA training remains necessary despite RLVR.

03

START enhances reasoning and answer quality on GQA benchmarks.

Abstract

Reinforcement learning from verifiable rewards (RLVR) stimulates the thinking processes of large language models (LLMs), substantially enhancing their reasoning abilities on verifiable tasks. It is often assumed that similar gains should transfer to general question answering (GQA), but this assumption has not been thoroughly validated. To assess whether RLVR automatically improves LLM performance on GQA, we propose a Cross-Generation evaluation framework that measures the quality of intermediate reasoning by feeding the generated thinking context into LLMs of varying capabilities. Our evaluation leads to a discouraging finding: the efficacy of the thinking process on GQA tasks is markedly lower than on verifiable tasks, suggesting that explicit training on GQA remains necessary in addition to training on verifiable tasks. We further observe that direct RL training on GQA is less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques