Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker
Jiangdong Fan, Hongcai He, Paul Weng, Hui Xu, Jie Shao

TL;DR
ILMAR is a novel imitation learning method that leverages both expert and suboptimal demonstrations through meta-learning an action ranker, significantly improving policy performance in scenarios with limited expert data.
Contribution
This paper introduces ILMAR, a meta-learning based approach that effectively utilizes suboptimal demonstrations by ranking and selectively integrating them into policy learning.
Findings
ILMAR outperforms previous methods on various tasks.
It effectively utilizes suboptimal demonstrations.
The approach improves policy performance with limited expert data.
Abstract
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations, which can be expensive or inaccessible. Learning from supplementary demonstrations without strict quality requirements has emerged as a powerful paradigm to address this challenge. However, previous methods often fail to fully utilize their potential by discarding non-expert data. Our key insight is that even demonstrations that fall outside the expert distribution but outperform the learned policy can enhance policy performance. To utilize this potential, we propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR). ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations. It utilizes the functional of the advantage function to selectively integrate knowledge from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training
