TL;DR
This paper introduces new perspectives and algorithms for MaxEnt IRL, including a generalized formulation, an exact inference algorithm, and practical improvements, enhancing reward learning accuracy and scalability.
Contribution
It presents a unified MaxEnt IRL framework based on KL-divergence, an exact inference algorithm, and scalable implementations for real-world applications.
Findings
Exact inference improves reward learning accuracy
Algorithm scales to large real-world datasets
Unified view of MaxEnt and Relative Entropy IRL
Abstract
We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse Reinforcement Learning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert demonstrations, among many consistent reward functions. We first present a generalized MaxEnt formulation based on minimizing a KL-divergence instead of maximizing an entropy. This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model. Second, a careful review of existing inference algorithms and implementations showed that they approximately compute the marginals required for learning the model. We provide examples to illustrate this, and present an efficient and exact inference algorithm. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
