DARL: Encouraging Diverse Answers for General Reasoning without Verifiers

Chongxuan Huang; Lei Lin; Xiaodong Shi; Wenping Hu; Ruiming Tang

arXiv:2601.14700·cs.CL·January 22, 2026

DARL: Encouraging Diverse Answers for General Reasoning without Verifiers

Chongxuan Huang, Lei Lin, Xiaodong Shi, Wenping Hu, Ruiming Tang

PDF

Open Access

TL;DR

DARL is a reinforcement learning framework that promotes diverse, high-quality answers in large language models without relying on domain-specific verifiers, improving reasoning and output variety.

Contribution

DARL introduces a simple, effective method to encourage answer diversity in general reasoning tasks without extra verifiers, compatible with existing RL approaches.

Findings

01

DARL outperforms RLPR on multiple benchmarks.

02

Achieves 1.3-point average gain on reasoning benchmarks.

03

Achieves 9.5-point average gain on general benchmarks.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated promising gains in enhancing the reasoning capabilities of large language models. However, its dependence on domain-specific verifiers significantly restricts its applicability to open and general domains. Recent efforts such as RLPR have extended RLVR to general domains, enabling training on broader datasets and achieving improvements over RLVR. However, a notable limitation of these methods is their tendency to overfit to reference answers, which constrains the model's ability to generate diverse outputs. This limitation is particularly pronounced in open-ended tasks such as writing, where multiple plausible answers exist. To address this, we propose DARL, a simple yet effective reinforcement learning framework that encourages the generation of diverse answers within a controlled deviation range from the reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques