Rewriting History with Inverse RL: Hindsight Inference for Policy   Improvement

Benjamin Eysenbach; Xinyang Geng; Sergey Levine; and Ruslan; Salakhutdinov

arXiv:2002.11089·cs.LG·February 26, 2020·30 cites

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Benjamin Eysenbach, Xinyang Geng, Sergey Levine, and Ruslan, Salakhutdinov

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach that uses inverse reinforcement learning for relabeling past experiences, enhancing multi-task reinforcement learning efficiency across various domains.

Contribution

It demonstrates that inverse RL can be integrated with relabeling techniques to improve sample efficiency in multi-task RL settings.

Findings

01

Inverse RL-based relabeling accelerates learning in multi-task RL.

02

Effective across goal-reaching and reward-structured domains.

03

Generalizes goal-relabeling to arbitrary task classes.

Abstract

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Banmahhhh/HIPI-RL
pytorch

Videos

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research