Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors
Lang Feng, Jiahao Lin, Dong Xing, Li Zhang, De Ma, Gang Pan

TL;DR
This paper introduces Bidirectional Distillation, a novel framework for multi-agent reinforcement learning that enhances generalization to unseen co-players by combining forward and reverse knowledge distillation, without extensive policy storage.
Contribution
It proposes a new mixed-play framework, Bidirectional Distillation, that improves policy generalization in MARL by leveraging alternating distillation directions and avoiding costly policy storage.
Findings
BiDist significantly improves generalization across various tasks.
It diversifies policy distribution space effectively.
Empirical results support theoretical analysis of BiDist's effectiveness.
Abstract
Population-population generalization is a challenging problem in multi-agent reinforcement learning (MARL), particularly when agents encounter unseen co-players. However, existing self-play-based methods are constrained by the limitation of inside-space generalization. In this study, we propose Bidirectional Distillation (BiDist), a novel mixed-play framework, to overcome this limitation in MARL. BiDist leverages knowledge distillation in two alternating directions: forward distillation, which emulates the historical policies' space and creates an implicit self-play, and reverse distillation, which systematically drives agents towards novel distributions outside the known policy space in a non-self-play manner. In addition, BiDist operates as a concise and efficient solution without the need for the complex and costly storage of past policies. We provide both theoretical analysis and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Game Theory and Cooperation · Artificial Intelligence in Games
MethodsKnowledge Distillation
