Revisiting Maximum Entropy Inverse Reinforcement Learning: New   Perspectives and Algorithms

Aaron J. Snoswell; Surya P. N. Singh; Nan Ye

arXiv:2012.00889·cs.LG·June 8, 2021

Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms

Aaron J. Snoswell, Surya P. N. Singh, Nan Ye

PDF

1 Repo

TL;DR

This paper introduces new perspectives and algorithms for MaxEnt IRL, including a generalized formulation, an exact inference algorithm, and practical improvements, enhancing reward learning accuracy and scalability.

Contribution

It presents a unified MaxEnt IRL framework based on KL-divergence, an exact inference algorithm, and scalable implementations for real-world applications.

Findings

01

Exact inference improves reward learning accuracy

02

Algorithm scales to large real-world datasets

03

Unified view of MaxEnt and Relative Entropy IRL

Abstract

We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse Reinforcement Learning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert demonstrations, among many consistent reward functions. We first present a generalized MaxEnt formulation based on minimizing a KL-divergence instead of maximizing an entropy. This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model. Second, a careful review of existing inference algorithms and implementations showed that they approximately compute the marginals required for learning the model. We provide examples to illustrate this, and present an efficient and exact inference algorithm. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aaronsnoswell/unimodal-irl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.