Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning

Minwoo Cho; Batuhan Altundas; and Matthew Gombolay

arXiv:2601.05407·cs.LG·January 12, 2026

Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning

Minwoo Cho, Batuhan Altundas, and Matthew Gombolay

PDF

Open Access

TL;DR

HINT introduces a hierarchical knowledge distillation framework for cooperative multi-agent reinforcement learning, addressing key challenges in policy synthesis, out-of-distribution reasoning, and observation mismatches, leading to significant performance improvements.

Contribution

The paper presents HINT, a novel KD framework utilizing hierarchical RL and pseudo off-policy updates to enhance teacher-student training in MARL.

Findings

01

HINT outperforms baselines with 60-165% success rate improvements.

02

Effective in complex cooperative domains like FireCommander and MARINE.

03

Improves out-of-distribution reasoning and observation mismatch handling.

Abstract

Knowledge distillation (KD) has the potential to accelerate MARL by employing a centralized teacher for decentralized students but faces key bottlenecks. Specifically, there are (1) challenges in synthesizing high-performing teaching policies in complex domains, (2) difficulties when teachers must reason in out-of-distribution (OOD) states, and (3) mismatches between the decentralized students' and the centralized teacher's observation spaces. To address these limitations, we propose HINT (Hierarchical INteractive Teacher-based transfer), a novel KD framework for MARL in a centralized training, decentralized execution setup. By leveraging hierarchical RL, HINT provides a scalable, high-performing teacher. Our key innovation, pseudo off-policy RL, enables the teacher policy to be updated using both teacher and student experience, thereby improving OOD adaptation. HINT also applies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Intelligent Tutoring Systems and Adaptive Learning · Human Pose and Action Recognition