Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Michelle Zhao, Reid Simmons, Henny Admoni, Aaditya Ramdas, Andrea, Bajcsy

TL;DR
This paper introduces ConformalDAgger, a novel interactive imitation learning method that uses conformal prediction-based uncertainty quantification to adapt to expert policy shifts and intermittent feedback, improving learning efficiency.
Contribution
It develops IQT, a new uncertainty quantification algorithm for intermittent labels, and integrates it into ConformalDAgger for improved active feedback querying during deployment.
Findings
ConformalDAgger detects high uncertainty during expert shifts.
It increases interventions to learn new behaviors faster.
It outperforms prior methods in simulated and real robotic tasks.
Abstract
In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain; however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot's uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInnovative Teaching and Learning Methods · Human Motion and Animation · Robot Manipulation and Learning
MethodsMonte Carlo Dropout · Dropout
