PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques,, Gregory Farquhar

TL;DR
PsiPhi-Learning introduces a novel reinforcement learning framework that leverages successor features and inverse temporal difference learning to utilize demonstrations without reward labels, enhancing transfer and imitation capabilities.
Contribution
It proposes a new multi-task IRL algorithm, ITD, that learns shared features and agent-specific preferences from demonstrations without reward information, integrated into RL with demonstrations.
Findings
Effective in improving RL, IRL, and imitation performance
Enables few-shot transfer to new tasks
Provides theoretical bounds for zero-shot transfer
Abstract
We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment. However, it has no access to the rewards or goals of these agents, and their objectives and levels of expertise may vary widely. These assumptions are common in multi-agent settings, such as autonomous driving. To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies. We propose a multi-task inverse reinforcement learning (IRL) algorithm, called \emph{inverse temporal difference learning} (ITD), that learns shared state features, alongside per-agent successor features and preference vectors, purely from demonstrations without reward labels. We further show how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Adaptive Dynamic Programming Control
