Fully General Online Imitation Learning
Michael K. Cohen, Marcus Hutter, Neel Nanda

TL;DR
This paper introduces a conservative Bayesian online imitation learning method that effectively bounds the likelihood of unlikely events in non-resetting environments, ensuring safer imitation with fewer demonstrator queries.
Contribution
It develops a new Bayesian imitation learner that operates in fully general, non-resetting environments, providing formal bounds on event likelihoods and reducing demonstrator queries.
Findings
Bounds the likelihood of unlikely events during imitation
Queries to the demonstrator decrease rapidly over time
Ensures safety by limiting the probability of dangerous events
Abstract
In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. In general, one mistake during learning can lead to completely different events. In the special setting of environments that restart, existing work provides formal guidance in how to imitate so that events unfold similarly, but outside that setting, no formal guidance exists. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes, and we allow our imitator to learn online from the demonstrator. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics
