Apprenticeship Learning using Inverse Reinforcement Learning and   Gradient Methods

Gergely Neu; Csaba Szepesvari

arXiv:1206.5264·cs.LG·June 26, 2012·157 cites

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

Gergely Neu, Csaba Szepesvari

PDF

Open Access

TL;DR

This paper introduces a new gradient-based algorithm for apprenticeship learning that infers reward functions from expert behavior, demonstrating improved reliability and efficiency over previous methods in artificial domains.

Contribution

The paper presents a novel gradient algorithm using subdifferentials and natural gradients for inverse reinforcement learning, addressing nonsmoothness and redundancy issues.

Findings

01

More reliable than previous methods

02

More efficient in artificial domains

03

Effective in matching expert behavior

Abstract

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning