Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors

Lang Feng; Jiahao Lin; Dong Xing; Li Zhang; De Ma; Gang Pan

arXiv:2505.11100·cs.LG·May 19, 2025

Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors

Lang Feng, Jiahao Lin, Dong Xing, Li Zhang, De Ma, Gang Pan

PDF

Open Access

TL;DR

This paper introduces Bidirectional Distillation, a novel framework for multi-agent reinforcement learning that enhances generalization to unseen co-players by combining forward and reverse knowledge distillation, without extensive policy storage.

Contribution

It proposes a new mixed-play framework, Bidirectional Distillation, that improves policy generalization in MARL by leveraging alternating distillation directions and avoiding costly policy storage.

Findings

01

BiDist significantly improves generalization across various tasks.

02

It diversifies policy distribution space effectively.

03

Empirical results support theoretical analysis of BiDist's effectiveness.

Abstract

Population-population generalization is a challenging problem in multi-agent reinforcement learning (MARL), particularly when agents encounter unseen co-players. However, existing self-play-based methods are constrained by the limitation of inside-space generalization. In this study, we propose Bidirectional Distillation (BiDist), a novel mixed-play framework, to overcome this limitation in MARL. BiDist leverages knowledge distillation in two alternating directions: forward distillation, which emulates the historical policies' space and creates an implicit self-play, and reverse distillation, which systematically drives agents towards novel distributions outside the known policy space in a non-self-play manner. In addition, BiDist operates as a concise and efficient solution without the need for the complex and costly storage of past policies. We provide both theoretical analysis and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Game Theory and Cooperation · Artificial Intelligence in Games

MethodsKnowledge Distillation