Adversarially Guided Self-Play for Adopting Social Conventions
Mycal Tucker, Yilun Zhou, Julie Shah

TL;DR
This paper introduces Adversarial Self-Play (ASP), a novel training method that efficiently enables robotic agents to adopt social conventions by leveraging unpaired data and adversarial training, improving learning speed and accuracy.
Contribution
The paper proposes ASP, a new adversarial training approach that enhances social convention learning in robots using unpaired data, with theoretical analysis and empirical validation.
Findings
ASP achieves closer adherence to social conventions with minimal paired data.
ASP outperforms previous methods in learning efficiency and accuracy.
Empirical results show ASP's effectiveness across three domains.
Abstract
Robotic agents must adopt existing social conventions in order to be effective teammates. These social conventions, such as driving on the right or left side of the road, are arbitrary choices among optimal policies, but all agents on a successful team must use the same convention. Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them. We build upon this work by introducing a technique called Adversarial Self-Play (ASP) that uses adversarial training to shape the space of possible learned policies and substantially improves learning efficiency. ASP only requires the addition of unpaired data: a dataset of outputs produced by the social convention without associated inputs. Theoretical analysis reveals how ASP shapes the policy space and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Adversarial Robustness in Machine Learning
