Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu,, Junzhou Huang, Chuang Gan

TL;DR
This paper introduces a novel approach called IDDM for imitation learning from observations, which minimizes inverse dynamics disagreement to bridge the gap with traditional learning from demonstration, showing improved performance on benchmarks.
Contribution
The paper provides a theoretical analysis of the difference between LfO and LfD, and proposes a practical, model-free method to minimize inverse dynamics disagreement, enhancing LfO performance.
Findings
IDDM reduces the gap between LfO and LfD.
Empirical results show IDDM outperforms other LfO methods.
The method is effective on challenging benchmarks.
Abstract
This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
