PC3D: Zero-Shot Cooperation Across Variable Rosters via Personalized Context Distillation
Ahmet Onur Akman, Rafa{\l} Kucharski

TL;DR
PC3D enables decentralized agents to adapt to varying team sizes in cooperative multi-agent reinforcement learning by distilling personalized coordination contexts from local histories.
Contribution
The paper introduces PC3D, a novel method for training decentralized policies that recover and utilize personalized coordination contexts without online retraining.
Findings
PC3D outperforms baselines on three MARL benchmarks.
It achieves higher returns with both seen and unseen team sizes.
Ablation studies confirm the importance of context distillation and adaptive use.
Abstract
Cooperative multi-agent reinforcement learning often assumes a fixed execution team, yet many decentralized systems must operate with varying numbers of active agents during deployment. We study this setting under episodic roster variation: each episode is executed by a set of homogeneous agents, with the team size varying across episodes. Agents act only from local histories, without execution-time communication, privileged coordinators, or online retraining. Therefore, effective cooperation requires each agent to recover relevant context about the active team and adapt its behavior accordingly. To this end, we propose PC3D (Personalized Central Coordination Context Distillation), a method for training decentralized policies to recover and use personalized coordination context from local interaction histories. During training, a set-structured centralized teacher compresses the active…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
