Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning
Minwoo Cho, Batuhan Altundas, and Matthew Gombolay

TL;DR
HINT introduces a hierarchical knowledge distillation framework for cooperative multi-agent reinforcement learning, addressing key challenges in policy synthesis, out-of-distribution reasoning, and observation mismatches, leading to significant performance improvements.
Contribution
The paper presents HINT, a novel KD framework utilizing hierarchical RL and pseudo off-policy updates to enhance teacher-student training in MARL.
Findings
HINT outperforms baselines with 60-165% success rate improvements.
Effective in complex cooperative domains like FireCommander and MARINE.
Improves out-of-distribution reasoning and observation mismatch handling.
Abstract
Knowledge distillation (KD) has the potential to accelerate MARL by employing a centralized teacher for decentralized students but faces key bottlenecks. Specifically, there are (1) challenges in synthesizing high-performing teaching policies in complex domains, (2) difficulties when teachers must reason in out-of-distribution (OOD) states, and (3) mismatches between the decentralized students' and the centralized teacher's observation spaces. To address these limitations, we propose HINT (Hierarchical INteractive Teacher-based transfer), a novel KD framework for MARL in a centralized training, decentralized execution setup. By leveraging hierarchical RL, HINT provides a scalable, high-performing teacher. Our key innovation, pseudo off-policy RL, enables the teacher policy to be updated using both teacher and student experience, thereby improving OOD adaptation. HINT also applies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Intelligent Tutoring Systems and Adaptive Learning · Human Pose and Action Recognition
