DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

Zhongwei Wan; Yun Shen; Zhihao Dou; Donghao Zhou; Yu Zhang; Xin Wang; Hui Shen; Jing Xiong; Chaofan Tao; Zixuan Zhong; Peizhou Huang; and Mi Zhang

arXiv:2602.19895·cs.LG·February 24, 2026

DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

Zhongwei Wan, Yun Shen, Zhihao Dou, Donghao Zhou, Yu Zhang, Xin Wang, Hui Shen, Jing Xiong, Chaofan Tao, Zixuan Zhong, Peizhou Huang, and Mi Zhang

PDF

Open Access

TL;DR

This paper introduces DSDR, a dual-scale diversity regularization framework for reinforcement learning with verifiers in large language model reasoning, enhancing exploration and accuracy.

Contribution

The paper proposes a novel dual-scale regularization method that decomposes diversity into global and local components, improving exploration in LLM reasoning tasks.

Findings

01

DSDR improves accuracy and pass@k in reasoning benchmarks.

02

It maintains diversity among correct reasoning trajectories.

03

Theoretical analysis supports optimal correctness preservation.

Abstract

Reinforcement learning with verifiers (RLVR) is a central paradigm for improving large language model (LLM) reasoning, yet existing methods often suffer from limited exploration. Policies tend to collapse onto a few reasoning patterns and prematurely stop deep exploration, while conventional entropy regularization introduces only local stochasticity and fails to induce meaningful path-level diversity, leading to weak and unstable learning signals in group-based policy optimization. We propose DSDR, a Dual-Scale Diversity Regularization reinforcement learning framework that decomposes diversity in LLM reasoning into global and coupling components. Globally, DSDR promotes diversity among correct reasoning trajectories to explore distinct solution modes. Locally, it applies a length-invariant, token-level entropy regularization restricted to correct trajectories, preventing entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)