DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning

Gaurav Srivastava; Zhenyu Bi; Meng Lu; Xuan Wang

arXiv:2505.15734·cs.CL·October 1, 2025

DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning

Gaurav Srivastava, Zhenyu Bi, Meng Lu, Xuan Wang

PDF

Open Access

TL;DR

This paper introduces DTE, a self-evolving, truth-free training framework for language models that leverages multi-agent debate traces and a new prompting strategy to improve reasoning accuracy and generalization without external data.

Contribution

The paper proposes a novel self-evolution framework using multi-agent debate traces and a new prompting strategy, advancing autonomous reasoning improvement in language models.

Findings

01

Achieved an 8.92% accuracy increase on GSM-PLUS dataset.

02

Demonstrated strong cross-domain generalization with a 5.8% average accuracy gain.

03

Validated effectiveness across seven reasoning benchmarks with six open-weight models.

Abstract

Large language models (LLMs) have improved significantly in their reasoning through extensive training on massive datasets. However, relying solely on additional data for improvement is becoming increasingly impractical, highlighting the need for models to autonomously enhance their reasoning without external supervision. In this paper, we propose Debate, Train, Evolve (DTE), a novel ground truth-free training framework that uses multi-agent debate traces to evolve a single language model. We also introduce a new prompting strategy Reflect-Critique-Refine, to improve debate quality by explicitly instructing agents to critique and refine their reasoning. Extensive evaluations on seven reasoning benchmarks with six open-weight models show that our DTE framework achieve substantial improvements, with an average accuracy gain of 8.92% on the challenging GSM-PLUS dataset. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multi-Agent Systems and Negotiation · Topic Modeling