A Bayesian Solution To The Imitation Gap
Risto Vuorio, Mattie Fellows, Cong Lu, Cl\'emence Grislain, Shimon, Whiteson

TL;DR
This paper introduces BIG, a Bayesian approach that uses inverse reinforcement learning to address the imitation gap caused by observability differences, enabling agents to explore effectively and learn optimal policies from demonstrations.
Contribution
The paper presents a Bayesian method that infers reward functions to overcome the imitation gap, allowing agents to explore and adapt in environments with partial observability.
Findings
BIG enables exploration at test time in the presence of an imitation gap.
BIG outperforms naive imitation learning in environments with observability differences.
The approach effectively learns optimal policies using expert demonstrations despite the imitation gap.
Abstract
In many real-world settings, an agent must learn to act in environments where no reward signal can be specified, but a set of expert demonstrations is available. Imitation learning (IL) is a popular framework for learning policies from such demonstrations. However, in some cases, differences in observability between the expert and the agent can give rise to an imitation gap such that the expert's policy is not optimal for the agent and a naive application of IL can fail catastrophically. In particular, if the expert observes the Markov state and the agent does not, then the expert will not demonstrate the information-gathering behavior needed by the agent but not the expert. In this paper, we propose a Bayesian solution to the Imitation Gap (BIG), first using the expert demonstrations, together with a prior specifying the cost of exploratory behavior that is not demonstrated, to infer a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Advanced Bandit Algorithms Research
