Inverse Reinforcement Learning with Explicit Policy Estimates

Navyata Sanghvi; Shinnosuke Usami; Mohit Sharma; Joachim Groeger; Kris; Kitani

arXiv:2103.02863·cs.LG·March 5, 2021

Inverse Reinforcement Learning with Explicit Policy Estimates

Navyata Sanghvi, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, Kris, Kitani

PDF

Open Access 1 Video

TL;DR

This paper unifies different IRL methods from machine learning and economics by revealing their common optimization framework, leading to improved algorithms and insights into their applicability for various scenarios.

Contribution

It establishes a connection between IRL methods based on entropy maximization and economic models with unobserved shocks, and introduces more efficient algorithms based on this unified view.

Findings

01

Unified IRL methods under a common optimization framework

02

Identified computational differences due to value function approximation

03

Proposed more efficient algorithms for specific IRL scenarios

Abstract

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Inverse Reinforcement Learning with Explicit Policy Estimates· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning