OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via   Distribution Matching

Hana Hoshino; Kei Ota; Asako Kanezaki; Rio Yokota

arXiv:2109.04307·cs.LG·May 24, 2022

OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Hana Hoshino, Kei Ota, Asako Kanezaki, Rio Yokota

PDF

Open Access 1 Repo

TL;DR

OPIRL introduces a sample-efficient off-policy IRL method that reduces environment interactions, learns transferable reward functions, and generalizes well across different environments and tasks.

Contribution

The paper proposes OPIRL, a novel off-policy IRL algorithm that improves sample efficiency and reward generalization compared to prior on-policy methods.

Findings

01

Significantly fewer environment interactions needed

02

Achieves comparable or better policy performance

03

Reward functions generalize across tasks and dynamics

Abstract

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sff1019/opirl
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials