Semi-Supervised Imitation Learning of Team Policies from Suboptimal   Demonstrations

Sangwon Seo; Vaibhav V. Unhelkar

arXiv:2205.02959·cs.AI·September 21, 2022

Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

Sangwon Seo, Vaibhav V. Unhelkar

PDF

TL;DR

This paper introduces BTIL, a Bayesian imitation learning algorithm that models team mental states to learn decentralized policies from suboptimal and limited demonstrations in multi-agent tasks.

Contribution

BTIL is the first method to explicitly model and infer time-varying mental states of team members, enabling effective learning from suboptimal and semi-supervised data.

Findings

01

BTIL outperforms existing methods on synthetic multi-agent tasks.

02

BTIL successfully learns from small, semi-supervised datasets.

03

BTIL effectively models the influence of mental states on team behavior.

Abstract

We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.