Concurrent Training Improves the Performance of Behavioral Cloning from Observation
Zachary W. Robertson, Matthew R. Walter

TL;DR
This paper introduces a modified behavioral cloning from observation (BCO*) that concurrently trains inverse dynamics and policy models, reducing initial interaction needs and achieving competitive imitation learning performance.
Contribution
The paper provides a theoretical analysis of BCO, proposes BCO* with concurrent training, and demonstrates improved sample efficiency and performance over existing methods.
Findings
BCO* reduces the dependence on initial interactions.
Concurrent training improves BCO performance.
BCO* achieves results competitive with state-of-the-art methods.
Abstract
Learning from demonstration is widely used as an efficient way for robots to acquire new skills. However, it typically requires that demonstrations provide full access to the state and action sequences. In contrast, learning from observation offers a way to utilize unlabeled demonstrations (e.g., video) to perform imitation learning. One approach to this is behavioral cloning from observation (BCO). The original implementation of BCO proceeds by first learning an inverse dynamics model and then using that model to estimate action labels, thereby reducing the problem to behavioral cloning. However, existing approaches to BCO require a large number of initial interactions in the first step. Here, we provide a novel theoretical analysis of BCO, introduce a modification BCO*, and show that in the semi-supervised setting, BCO* can concurrently improve both its estimate for the inverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning
MethodsGenerative Adversarial Imitation Learning
