Causal Imitation Learning with Unobserved Confounders
Junzhe Zhang, Daniel Kumor, Elias Bareinboim

TL;DR
This paper investigates imitation learning in scenarios with unobserved confounders, providing a causal framework to determine when imitation is feasible and proposing methods to learn policies under these conditions.
Contribution
It introduces a non-parametric causal criterion for feasibility of imitation learning with unobserved confounders and develops an efficient policy learning procedure.
Findings
A complete causal criterion for imitation feasibility.
Feasibility can be achieved with quantitative knowledge even if the criterion fails.
An efficient algorithm for policy learning from expert trajectories.
Abstract
One of the common ways children learn is by mimicking adults. Imitation learning focuses on learning policies with suitable performance from demonstrations generated by an expert, with an unspecified performance measure, and unobserved reward signal. Popular methods for imitation learning start by either directly mimicking the behavior policy of an expert (behavior cloning) or by learning a reward function that prioritizes observed expert trajectories (inverse reinforcement learning). However, these methods rely on the assumption that covariates used by the expert to determine her/his actions are fully observed. In this paper, we relax this assumption and study imitation learning when sensory inputs of the learner and the expert differ. First, we provide a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Machine Learning and Algorithms
