Efficient Generation of Diverse Cooperative Agents with World Models
Yi Loo, Akshunn Trivedi, Malika Meghjani

TL;DR
This paper introduces XPM-WM, a novel framework that uses learned world models to generate simulated trajectories, significantly improving the efficiency and scalability of creating diverse cooperative agents for zero-shot coordination.
Contribution
The paper presents a new method leveraging world models to generate simulated trajectories, reducing computational costs and enhancing diversity in cooperative agent training.
Findings
XPM-WM matches previous methods in population reward performance.
It is more sample efficient and scalable to larger populations.
Effective in generating diverse cooperative agents for ZSC.
Abstract
A major bottleneck in the training process for Zero-Shot Coordination (ZSC) agents is the generation of partner agents that are diverse in collaborative conventions. Current Cross-play Minimization (XPM) methods for population generation can be very computationally expensive and sample inefficient as the training objective requires sampling multiple types of trajectories. Each partner agent in the population is also trained from scratch, despite all of the partners in the population learning policies of the same coordination task. In this work, we propose that simulated trajectories from the dynamics model of an environment can drastically speed up the training process for XPM methods. We introduce XPM-WM, a framework for generating simulated trajectories for XPM via a learned World Model (WM). We show XPM with simulated trajectories removes the need to sample multiple trajectories. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Human Motion and Animation · Robot Manipulation and Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
