TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

Zhangchen Xu; Yuetai Li; Fengqing Jiang; Bhaskar Ramasubramanian; Luyao Niu; Bill Yuchen Lin; Radha Poovendran

arXiv:2505.14625·cs.LG·May 23, 2025

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

Zhangchen Xu, Yuetai Li, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

PDF

Open Access 1 Repo 4 Models 2 Datasets

TL;DR

This paper identifies false negatives in verifiers used for RL training of LLMs, analyzes their impact, and proposes TinyV, a lightweight verifier that improves reward accuracy, leading to better math reasoning performance.

Contribution

The paper introduces TinyV, a lightweight LLM-based verifier that reduces false negatives in RL training of LLMs, enhancing reward reliability and model performance.

Findings

01

False negatives affect over 38% of responses in the dataset.

02

TinyV improves pass rates by up to 10% on benchmarks.

03

Integrating TinyV accelerates RL convergence.

Abstract

Reinforcement Learning (RL) has become a powerful tool for enhancing the reasoning abilities of large language models (LLMs) by optimizing their policies with reward signals. Yet, RL's success relies on the reliability of rewards, which are provided by verifiers. In this paper, we expose and analyze a widespread problem--false negatives--where verifiers wrongly reject correct model outputs. Our in-depth study of the Big-Math-RL-Verified dataset reveals that over 38% of model-generated responses suffer from false negatives, where the verifier fails to recognize correct answers. We show, both empirically and theoretically, that these false negatives severely impair RL training by depriving the model of informative gradient signals and slowing convergence. To mitigate this, we propose tinyV, a lightweight LLM-based verifier that augments existing rule-based methods, which dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uw-nsl/tinyv
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications