Counterfactual Behavior Cloning: Offline Imitation Learning from Imperfect Human Demonstrations
Shahabedin Sagheb, Dylan P. Losey

TL;DR
Counterfactual Behavior Cloning (Counter-BC) improves offline imitation learning by inferring the intended policy behind imperfect human demonstrations, enabling robots to learn more accurately from noisy and suboptimal data.
Contribution
This work introduces Counter-BC, a novel method that extrapolates the intended behavior from noisy demonstrations, outperforming existing imitation learning techniques.
Findings
Counter-BC effectively extracts underlying policies from imperfect data.
Counter-BC outperforms state-of-the-art methods in noisy and real-world settings.
Theoretically proven to recover desired policies from diverse and imperfect demonstrations.
Abstract
Learning from humans is challenging because people are imperfect teachers. When everyday humans show the robot a new task they want it to perform, humans inevitably make errors (e.g., inputting noisy actions) and provide suboptimal examples (e.g., overshooting the goal). Existing methods learn by mimicking the exact behaviors the human teacher provides -- but this approach is fundamentally limited because the demonstrations themselves are imperfect. In this work we advance offline imitation learning by enabling robots to extrapolate what the human teacher meant, instead of only considering what the human actually showed. We achieve this by hypothesizing that all of the human's demonstrations are trying to convey a single, consistent policy, while the noise and sub-optimality within their behaviors obfuscates the data and introduces unintentional complexity. To recover the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI
