DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
Yubo Shu, Zhewei Huang, Xin Wu, Chen Hu, Shuchang Zhou, Daxin Jiang

TL;DR
DialogueReason introduces a dialogue-based reasoning paradigm for large language models, improving reasoning diversity and coherency over traditional monologue approaches, especially on complex compound questions.
Contribution
It develops a novel dialogue-based reasoning framework, analyzes monologue reasoning weaknesses, and demonstrates improved performance on complex reasoning benchmarks.
Findings
DialogueReason outperforms monologue models on Compound-QA and other benchmarks.
Dialogue reasoning enhances interpretability and human interaction.
The approach inspires multi-agent system design in LLMs.
Abstract
We propose DialogueReason, a reasoning paradigm that uncovers the lost roles in monologue-style reasoning models, aiming to boost diversity and coherency of the reasoning process. Recent advances in RL-based large reasoning models have led to impressive long CoT capabilities and high performance on math and science benchmarks. However, these reasoning models rely mainly on monologue-style reasoning, which often limits reasoning diversity and coherency, frequently recycling fixed strategies or exhibiting unnecessary shifts in attention. Our work consists of an analysis of monologue reasoning patterns and the development of a dialogue-based reasoning approach. We first introduce the Compound-QA task, which concatenates multiple problems into a single prompt to assess both diversity and coherency of reasoning. Our analysis shows that Compound-QA exposes weaknesses in monologue reasoning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
MethodsEntropy Regularization · Proximal Policy Optimization · ADaptive gradient method with the OPTimal convergence rate
