When Demonstrations Meet Generative World Models: A Maximum Likelihood   Framework for Offline Inverse Reinforcement Learning

Siliang Zeng; Chenliang Li; Alfredo Garcia; Mingyi Hong

arXiv:2302.07457·cs.LG·March 1, 2024

When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a bi-level optimization framework for offline inverse reinforcement learning that accounts for model uncertainty, leading to more accurate reward estimation and outperforming existing methods in continuous control tasks.

Contribution

It proposes a novel bi-level optimization approach incorporating model uncertainty into offline IRL, with theoretical guarantees and superior empirical performance.

Findings

01

Outperforms state-of-the-art offline IRL and imitation learning methods.

02

Provides statistical and computational guarantees for reward estimation.

03

Demonstrates effectiveness on MuJoCo and D4RL benchmarks.

Abstract

Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world'' model). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

When Demonstrations meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics