Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement

Xiaofeng Zhou; Heyan Huang; Lizi Liao

arXiv:2506.03541·cs.CL·June 5, 2025

Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement

Xiaofeng Zhou, Heyan Huang, Lizi Liao

PDF

Open Access

TL;DR

This paper introduces a novel multi-agent feedback framework using debate and tree-structured preference optimization to enhance the performance of smaller language models efficiently.

Contribution

It proposes a new Debate and Reflect framework combined with T-DPO for improved model training, surpassing existing distillation and feedback methods.

Findings

01

Significant accuracy improvements in smaller models

02

Enhanced robustness and generalization capabilities

03

Outperforms baseline distillation techniques

Abstract

Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks, yet their high computational demands limit widespread adoption. While distilling large models into smaller ones offers a sustainable solution, current techniques--such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection--struggle to yield substantial and lasting performance gains. In this paper, we present a novel Debate and Reflect (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback (e.g., error analysis, corrective strategies) to guide student models. Further, we introduce Tree-structured Direct Preference Optimization (T-DPO) to efficiently leverage these debate logs, organizing interactions into a hierarchical format for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training