Option Compatible Reward Inverse Reinforcement Learning

Rakhoon Hwang; Hanjin Lee; Hyung Ju Hwang

arXiv:1911.02723·cs.LG·January 20, 2021

Option Compatible Reward Inverse Reinforcement Learning

Rakhoon Hwang, Hanjin Lee, Hyung Ju Hwang

PDF

TL;DR

This paper introduces a hierarchical inverse reinforcement learning method within the options framework, leveraging intrinsic motivation and temporal abstraction to recover reward functions effectively, even with noisy demonstrations.

Contribution

It presents a novel gradient-based approach for IRL using options, incorporating second-order optimality for reward selection, enhancing transfer learning and robustness.

Findings

01

Recovered rewards accelerate transfer learning.

02

Method is robust to noisy expert demonstrations.

03

Effective in both discrete and continuous domains.

Abstract

Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement learning algorithms depends on a well-designed reward function. Inverse reinforcement learning (IRL) solves the problem of recovering reward functions from expert demonstrations. In this paper, we solve a hierarchical inverse reinforcement learning problem within the options framework, which allows us to utilize intrinsic motivation of the expert demonstrations. A gradient method for parametrized options is used to deduce a defining equation for the Q-feature space, which leads to a reward feature space. Using a second-order optimality condition for option parameters, an optimal reward function is selected. Experimental results in both discrete and continuous domains confirm that our recovered rewards provide a solution to the IRL problem using temporal abstraction, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.