Fully General Online Imitation Learning

Michael K. Cohen; Marcus Hutter; Neel Nanda

arXiv:2102.08686·cs.LG·October 5, 2022·1 cites

Fully General Online Imitation Learning

Michael K. Cohen, Marcus Hutter, Neel Nanda

PDF

Open Access

TL;DR

This paper introduces a conservative Bayesian online imitation learning method that effectively bounds the likelihood of unlikely events in non-resetting environments, ensuring safer imitation with fewer demonstrator queries.

Contribution

It develops a new Bayesian imitation learner that operates in fully general, non-resetting environments, providing formal bounds on event likelihoods and reducing demonstrator queries.

Findings

01

Bounds the likelihood of unlikely events during imitation

02

Queries to the demonstrator decrease rapidly over time

03

Ensures safety by limiting the probability of dangerous events

Abstract

In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. In general, one mistake during learning can lead to completely different events. In the special setting of environments that restart, existing work provides formal guidance in how to imitate so that events unfold similarly, but outside that setting, no formal guidance exists. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes, and we allow our imitator to learn online from the demonstrator. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics