Partial Identifiability and Misspecification in Inverse Reinforcement Learning
Joar Skalse, Alessandro Abate

TL;DR
This paper provides a comprehensive mathematical analysis of the challenges in inverse reinforcement learning, focusing on partial reward identifiability and the effects of behavioral model misspecification on reward inference.
Contribution
It characterizes reward ambiguity and misspecification effects across common IRL models, offering a framework and tools for analyzing IRL robustness and identifiability.
Findings
Fully characterizes reward ambiguity for common IRL models.
Provides necessary and sufficient conditions for model misspecification impact.
Introduces a framework for analyzing partial identifiability and robustness in IRL.
Abstract
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function from a policy . This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer from , an IRL algorithm must have a *behavioural model* of how relates to . However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
