DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Yubo Shu; Zhewei Huang; Xin Wu; Chen Hu; Shuchang Zhou; Daxin Jiang

arXiv:2505.07049·cs.AI·May 13, 2025

DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Yubo Shu, Zhewei Huang, Xin Wu, Chen Hu, Shuchang Zhou, Daxin Jiang

PDF

Open Access

TL;DR

DialogueReason introduces a dialogue-based reasoning paradigm for large language models, improving reasoning diversity and coherency over traditional monologue approaches, especially on complex compound questions.

Contribution

It develops a novel dialogue-based reasoning framework, analyzes monologue reasoning weaknesses, and demonstrates improved performance on complex reasoning benchmarks.

Findings

01

DialogueReason outperforms monologue models on Compound-QA and other benchmarks.

02

Dialogue reasoning enhances interpretability and human interaction.

03

The approach inspires multi-agent system design in LLMs.

Abstract

We propose DialogueReason, a reasoning paradigm that uncovers the lost roles in monologue-style reasoning models, aiming to boost diversity and coherency of the reasoning process. Recent advances in RL-based large reasoning models have led to impressive long CoT capabilities and high performance on math and science benchmarks. However, these reasoning models rely mainly on monologue-style reasoning, which often limits reasoning diversity and coherency, frequently recycling fixed strategies or exhibiting unnecessary shifts in attention. Our work consists of an analysis of monologue reasoning patterns and the development of a dialogue-based reasoning approach. We first introduce the Compound-QA task, which concatenates multiple problems into a single prompt to assess both diversity and coherency of reasoning. Our analysis shows that Compound-QA exposes weaknesses in monologue reasoning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsEntropy Regularization · Proximal Policy Optimization · ADaptive gradient method with the OPTimal convergence rate