When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong

TL;DR
This paper introduces a bi-level optimization framework for offline inverse reinforcement learning that accounts for model uncertainty, leading to more accurate reward estimation and outperforming existing methods in continuous control tasks.
Contribution
It proposes a novel bi-level optimization approach incorporating model uncertainty into offline IRL, with theoretical guarantees and superior empirical performance.
Findings
Outperforms state-of-the-art offline IRL and imitation learning methods.
Provides statistical and computational guarantees for reward estimation.
Demonstrates effectiveness on MuJoCo and D4RL benchmarks.
Abstract
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world'' model). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
