MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

Jianze Wang; Ying Liu; Jinlong Chen; Xuchun Hu; Qilong Zhang; Yu Cao; Jun Wang; Hua Yang; Yong Xie; Qianglong Chen

arXiv:2605.01347·cs.CL·May 5, 2026

MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

Jianze Wang, Ying Liu, Jinlong Chen, Xuchun Hu, Qilong Zhang, Yu Cao, Jun Wang, Hua Yang, Yong Xie, Qianglong Chen

PDF

1 Repo

TL;DR

MAD-OPD introduces a multi-agent debate framework to enhance on-policy distillation, surpassing single-teacher limits and stabilizing training in agentic tasks, achieving state-of-the-art results across multiple benchmarks.

Contribution

It proposes MAD-OPD, a novel multi-agent debate approach for on-policy distillation, and extends it to agentic tasks with a new divergence principle, improving performance significantly.

Findings

01

MAD-OPD outperforms all six baseline configurations.

02

It improves agentic average by 2.4% and code average by 3.7%.

03

The method ranks first across all tested configurations.

Abstract

On-policy distillation (OPD) trains a student on its own trajectories under token-level teacher supervision, but existing methods are capped by a single-teacher capability ceiling: when the teacher errs, the student inherits the error. OPD also remains largely unexplored in agentic tasks, where per-step errors compound across long trajectories and destabilize training. We propose MAD-OPD (Multi-Agent Debate-driven On-Policy Distillation), which breaks this ceiling by recasting the distillation teacher as a deliberative collective of teachers that debate over the student's on-policy state; the debate produces an emergent collective intelligence that supplies token-level supervision, with each teacher's contribution weighted by its post-debate confidence. To extend OPD to agentic tasks, we also introduce On-Policy Agentic Distillation (OPAD), which adds step-level sampling to stabilize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chiefovoavicii/MAD-OPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.