Efficient Generation of Diverse Cooperative Agents with World Models

Yi Loo; Akshunn Trivedi; Malika Meghjani

arXiv:2506.07450·cs.AI·June 10, 2025

Efficient Generation of Diverse Cooperative Agents with World Models

Yi Loo, Akshunn Trivedi, Malika Meghjani

PDF

Open Access

TL;DR

This paper introduces XPM-WM, a novel framework that uses learned world models to generate simulated trajectories, significantly improving the efficiency and scalability of creating diverse cooperative agents for zero-shot coordination.

Contribution

The paper presents a new method leveraging world models to generate simulated trajectories, reducing computational costs and enhancing diversity in cooperative agent training.

Findings

01

XPM-WM matches previous methods in population reward performance.

02

It is more sample efficient and scalable to larger populations.

03

Effective in generating diverse cooperative agents for ZSC.

Abstract

A major bottleneck in the training process for Zero-Shot Coordination (ZSC) agents is the generation of partner agents that are diverse in collaborative conventions. Current Cross-play Minimization (XPM) methods for population generation can be very computationally expensive and sample inefficient as the training objective requires sampling multiple types of trajectories. Each partner agent in the population is also trained from scratch, despite all of the partners in the population learning policies of the same coordination task. In this work, we propose that simulated trajectories from the dynamics model of an environment can drastically speed up the training process for XPM methods. We introduce XPM-WM, a framework for generating simulated trajectories for XPM via a learned World Model (WM). We show XPM with simulated trajectories removes the need to sample multiple trajectories. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human Motion and Animation · Robot Manipulation and Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings