On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning
Sacha Morin, Moonsub Byeon, Alexia Jolicoeur-Martineau, S\'ebastien Lachapelle

TL;DR
This paper investigates the sample efficiency of inverse dynamics models in semi-supervised imitation learning, revealing their advantages and proposing improvements to existing algorithms based on theoretical and experimental insights.
Contribution
It demonstrates that IDM-based policies are as effective as video-model-based policies in the limit, and introduces an improved LAPO algorithm for latent action policy learning.
Findings
IDM-based policies and VM-IDM learn the same policy in the limit.
IDM learning's advantage is due to lower hypothesis class complexity and less stochasticity.
Proposed an improved LAPO algorithm for better policy learning.
Abstract
Semi-supervised imitation learning (SSIL) consists in learning a policy from a small dataset of action-labeled trajectories and a much larger dataset of action-free trajectories. Some SSIL methods learn an inverse dynamics model (IDM) to predict the action from the current state and the next state. An IDM can act as a policy when paired with a video model (VM-IDM) or as a label generator to perform behavior cloning on action-free data (IDM labeling). In this work, we first show that VM-IDM and IDM labeling learn the same policy in a limit case, which we call the IDM-based policy. We then argue that the previously observed advantage of IDM-based policies over behavior cloning is due to the superior sample efficiency of IDM learning, which we attribute to two causes: (i) the ground-truth IDM tends to be contained in a lower complexity hypothesis class relative to the expert policy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Human Motion and Animation
