Exploring Beyond-Demonstrator via Meta Learning-Based Reward Extrapolation
Mingqi Yuan, Mao-on Pun

TL;DR
This paper introduces MLRE, a meta learning-based reward extrapolation method that effectively learns from limited demonstrations to outperform demonstrators, addressing data scarcity issues in imitation learning.
Contribution
The paper proposes a novel meta learning approach for reward extrapolation that requires fewer demonstrations and improves performance over existing methods.
Findings
MLRE outperforms similar algorithms in simulation tasks.
Effective with limited demonstration data.
Significant performance improvements demonstrated.
Abstract
Extrapolating beyond-demonstrator (BD) performance through the imitation learning (IL) algorithm aims to learn from and subsequently outperform the demonstrator. To that end, a representative approach is to leverage inverse reinforcement learning (IRL) to infer a reward function from demonstrations before performing RL on the learned reward function. However, most existing reward extrapolation methods require massive demonstrations, making it difficult to be applied in tasks of limited training data. To address this problem, one simple solution is to perform data augmentation to artificially generate more training data, which may incur severe inductive bias and policy performance loss. In this paper, we propose a novel meta learning-based reward extrapolation (MLRE) algorithm, which can effectively approximate the ground-truth rewards using limited demonstrations. More specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuscle activation and electromyography studies · Viral Infectious Diseases and Gene Expression in Insects · Reinforcement Learning in Robotics
