Option Compatible Reward Inverse Reinforcement Learning
Rakhoon Hwang, Hanjin Lee, Hyung Ju Hwang

TL;DR
This paper introduces a hierarchical inverse reinforcement learning method within the options framework, leveraging intrinsic motivation and temporal abstraction to recover reward functions effectively, even with noisy demonstrations.
Contribution
It presents a novel gradient-based approach for IRL using options, incorporating second-order optimality for reward selection, enhancing transfer learning and robustness.
Findings
Recovered rewards accelerate transfer learning.
Method is robust to noisy expert demonstrations.
Effective in both discrete and continuous domains.
Abstract
Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement learning algorithms depends on a well-designed reward function. Inverse reinforcement learning (IRL) solves the problem of recovering reward functions from expert demonstrations. In this paper, we solve a hierarchical inverse reinforcement learning problem within the options framework, which allows us to utilize intrinsic motivation of the expert demonstrations. A gradient method for parametrized options is used to deduce a defining equation for the Q-feature space, which leads to a reward feature space. Using a second-order optimality condition for option parameters, an optimal reward function is selected. Experimental results in both discrete and continuous domains confirm that our recovered rewards provide a solution to the IRL problem using temporal abstraction, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
