TL;DR
This paper introduces Self-Debate Reinforcement Learning (SDRL), a training framework that enhances large language models by enabling them to learn from self-debate, improving both standalone reasoning and multi-agent debate performance.
Contribution
SDRL is a novel training method that prepares models for multi-agent debate by jointly optimizing for standalone and debate-conditioned reasoning capabilities.
Findings
SDRL improves multi-agent debate performance across various protocols.
SDRL enhances single-model reasoning abilities.
Experiments show consistent gains across multiple models and benchmarks.
Abstract
The reasoning abilities of large language models (LLMs) have been substantially improved by reinforcement learning with verifiable rewards (RLVR). At test time, collaborative reasoning through Multi-Agent Debate (MAD) has emerged as a promising approach for enhancing LLM performance. However, current RLVR methods typically train LLMs to solve problems in isolation, without explicitly preparing them to synthesize and benefit from different rationales that arise during debate. In this work, we propose Self-Debate Reinforcement Learning(SDRL), a training framework where models learn from self-debate, equipping a single LLM with both strong standalone problem-solving ability and the capability to process diverse reasoning trajectories in MAD. Given a prompt, SDRL first samples multiple candidate solutions, then constructs a debate context with diverse reasoning paths and generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Topic Modeling · Hate Speech and Cyberbullying Detection
