Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
Arrasy Rahman, Jiaxun Cui, Peter Stone

TL;DR
This paper introduces L-BRDiv, an algorithm that trains robust ad hoc teamwork agents by generating teammate policies that emulate the minimum coverage set, improving robustness across various cooperative problems.
Contribution
The paper proposes the L-BRDiv algorithm to generate teammate policies that better approximate the minimum coverage set, enhancing robustness in ad hoc teamwork agents.
Findings
L-BRDiv produces more robust agents than state-of-the-art methods.
L-BRDiv does not require extensive hyperparameter tuning.
It outperforms baselines by discovering diverse MCS members.
Abstract
Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies obtained through maximizing specific diversity metrics. However, prior heuristic-based diversity metrics do not always maximize the agent's robustness in all cooperative problems. In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment. We then introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS. L-BRDiv works by solving a constrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics · Transportation and Mobility Innovations
MethodsHigh-Order Consensuses
