PsiPhi-Learning: Reinforcement Learning with Demonstrations using   Successor Features and Inverse Temporal Difference Learning

Angelos Filos; Clare Lyle; Yarin Gal; Sergey Levine; Natasha Jaques,; Gregory Farquhar

arXiv:2102.12560·cs.LG·June 11, 2021·5 cites

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning

Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques,, Gregory Farquhar

PDF

Open Access 1 Repo 1 Video

TL;DR

PsiPhi-Learning introduces a novel reinforcement learning framework that leverages successor features and inverse temporal difference learning to utilize demonstrations without reward labels, enhancing transfer and imitation capabilities.

Contribution

It proposes a new multi-task IRL algorithm, ITD, that learns shared features and agent-specific preferences from demonstrations without reward information, integrated into RL with demonstrations.

Findings

01

Effective in improving RL, IRL, and imitation performance

02

Enables few-shot transfer to new tasks

03

Provides theoretical bounds for zero-shot transfer

Abstract

We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment. However, it has no access to the rewards or goals of these agents, and their objectives and levels of expertise may vary widely. These assumptions are common in multi-agent settings, such as autonomous driving. To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies. We propose a multi-task inverse reinforcement learning (IRL) algorithm, called \emph{inverse temporal difference learning} (ITD), that learns shared state features, alongside per-agent successor features and preference vectors, purely from demonstrations without reward labels. We further show how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

filangelos/social_rl
jaxOfficial

Videos

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Adaptive Dynamic Programming Control