A Bayesian Solution To The Imitation Gap

Risto Vuorio; Mattie Fellows; Cong Lu; Cl\'emence Grislain; Shimon; Whiteson

arXiv:2407.00495·cs.LG·July 2, 2024

A Bayesian Solution To The Imitation Gap

Risto Vuorio, Mattie Fellows, Cong Lu, Cl\'emence Grislain, Shimon, Whiteson

PDF

Open Access

TL;DR

This paper introduces BIG, a Bayesian approach that uses inverse reinforcement learning to address the imitation gap caused by observability differences, enabling agents to explore effectively and learn optimal policies from demonstrations.

Contribution

The paper presents a Bayesian method that infers reward functions to overcome the imitation gap, allowing agents to explore and adapt in environments with partial observability.

Findings

01

BIG enables exploration at test time in the presence of an imitation gap.

02

BIG outperforms naive imitation learning in environments with observability differences.

03

The approach effectively learns optimal policies using expert demonstrations despite the imitation gap.

Abstract

In many real-world settings, an agent must learn to act in environments where no reward signal can be specified, but a set of expert demonstrations is available. Imitation learning (IL) is a popular framework for learning policies from such demonstrations. However, in some cases, differences in observability between the expert and the agent can give rise to an imitation gap such that the expert's policy is not optimal for the agent and a naive application of IL can fail catastrophically. In particular, if the expert observes the Markov state and the agent does not, then the expert will not demonstrate the information-gathering behavior needed by the agent but not the expert. In this paper, we propose a Bayesian solution to the Imitation Gap (BIG), first using the expert demonstrations, together with a prior specifying the cost of exploratory behavior that is not demonstrated, to infer a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Advanced Bandit Algorithms Research