Identifiability in inverse reinforcement learning

Haoyang Cao; Samuel N. Cohen; Lukasz Szpruch

arXiv:2106.03498·cs.LG·November 9, 2021·5 cites

Identifiability in inverse reinforcement learning

Haoyang Cao, Samuel N. Cohen, Lukasz Szpruch

PDF

Open Access 1 Video

TL;DR

This paper addresses the challenge of non-identifiability in inverse reinforcement learning by providing conditions under which reward functions can be uniquely recovered, especially with entropy regularization and varied environments.

Contribution

It offers a complete characterization of reward functions leading to a given policy and establishes conditions for reward recovery across different settings.

Findings

01

Reward functions are identifiable up to a constant with demonstrations under different discount factors.

02

Necessary and sufficient conditions for reconstructing time-homogeneous rewards are provided.

03

Generalization of previous results for action-independent rewards and finite horizons.

Abstract

Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem, using observations of agent actions. As already observed in Russell [1998] the problem is ill-posed, and the reward function is not identifiable, even under the presence of perfect information about optimal behavior. We provide a resolution to this non-identifiability for problems with entropy regularization. For a given environment, we fully characterize the reward functions leading to a given policy and demonstrate that, given demonstrations of actions for the same reward under two distinct discount factors, or under sufficiently different environments, the unobserved reward can be recovered up to a constant. We also give general necessary and sufficient conditions for reconstruction of time-homogeneous rewards on finite horizons, and for action-independent rewards, generalizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Identifiability in inverse reinforcement learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Adaptive Dynamic Programming Control