Zero Shot Coordination for Sparse Reward Tasks with Diverse Reward Shapings
Keenan Powell, Peihong Yu, Pratap Tokekar

TL;DR
This paper introduces a method for Zero-Shot Coordination in multi-agent reinforcement learning that effectively handles diverse reward shapings, improving cooperation with agents trained under different reward structures.
Contribution
It proposes training an ensemble of methods with randomized reward shapings to enhance zero-shot coordination in environments with diverse reward structures.
Findings
Achieved 62.2%-119.2% improvement in sparse reward performance.
Demonstrated consistent improvements in the Overcooked environment.
Addressed the challenge of cooperating with agents with different reward shapings.
Abstract
Many Multi-Agent Reinforcement Learning (MARL) agents fail to adapt properly to cooperating with agents trained with the same objectives but different seeds, algorithms, or other training differences. This is the problem of Zero-Shot Coordination (ZSC), which focuses on training agents to cooperate well with unknown agents. ZSC has been studied for a variety of tabular cases and simple games such as Hanabi, achieving excellent results. However, existing solutions to ZSC only consider identical rewards for your trained agents and all future partners. This is not realistic for the trained agents, as they do not consider the problem of cooperating with agents that have identical sparse objectives but shape the rewards for those objectives in different manner. To address this issue, we show how to train an ensemble of methods using randomized reward shapings chosen using 4 selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
