CLARE: Conservative Model-Based Reward Learning for Offline Inverse   Reinforcement Learning

Sheng Yue; Guanbo Wang; Wei Shao; Zhaofeng Zhang; Sen Lin; Ju Ren,; Junshan Zhang

arXiv:2302.04782·cs.LG·February 22, 2023·6 cites

CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren,, Junshan Zhang

PDF

Open Access 1 Video

TL;DR

CLARE introduces a conservative model-based approach for offline IRL that effectively reduces reward extrapolation errors by balancing exploitation of expert and diverse data with exploration of an estimated dynamics model.

Contribution

The paper proposes CLARE, a novel offline IRL algorithm that integrates conservatism and dynamics modeling to mitigate reward extrapolation errors and improve performance.

Findings

01

CLARE outperforms existing methods on MuJoCo tasks.

02

The learned reward function is highly instructive for subsequent learning.

03

CLARE effectively balances exploitation and exploration to reduce reward extrapolation error.

Abstract

This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL), namely the reward extrapolation error, where the learned reward function may fail to explain the task correctly and misguide the agent in unseen environments due to the intrinsic covariate shift. Leveraging both expert data and lower-quality diverse data, we devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function and utilizing an estimated dynamics model. Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy, based on which we characterize the impact of covariate shift by examining subtle two-tier tradeoffs between the exploitation (on both expert and diverse data) and exploration (on the estimated dynamics model). We show that CLARE can provably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics

Methodsfail