NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Wei Liu; Siya Qi; Xinyu Wang; Chen Qian; Yali Du; Yulan He

arXiv:2505.16022·cs.CL·September 4, 2025

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He

PDF

Open Access 1 Repo 2 Models 3 Datasets 1 Video

TL;DR

NOVER introduces a verifier-free reinforcement learning framework for language models that enhances reasoning capabilities without external verifiers, using only supervised fine-tuning data, and outperforms some large reasoning models.

Contribution

NOVER presents a novel incentive training method that eliminates the need for external verifiers, broadening applicability and improving performance of language models.

Findings

01

Outperforms distilled models from large reasoning models by 7.7%.

02

Enables incentive training across diverse text-to-text tasks.

03

Supports inverse incentive training for further optimization.

Abstract

Recent advances such as DeepSeek R1-Zero highlight the effectiveness of incentive training, a reinforcement learning paradigm that computes rewards solely based on the final answer part of a language model's output, thereby encouraging the generation of intermediate reasoning steps. However, these methods fundamentally rely on external verifiers, which limits their applicability to domains like mathematics and coding where such verifiers are readily available. Although reward models can serve as verifiers, they require high-quality annotated data and are costly to train. In this work, we propose NOVER, NO-VERifier Reinforcement Learning, a general reinforcement learning framework that requires only standard supervised fine-tuning data with no need for an external verifier. NOVER enables incentive training across a wide range of text-to-text tasks and outperforms the model of the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thinkwee/nover
pytorchOfficial

Models

Datasets

Videos

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques