Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

Zhengyao Jiang; Tianjun Zhang; Robert Kirk; Tim Rockt\"aschel; Edward; Grefenstette

arXiv:2205.15824·cs.LG·June 1, 2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rockt\"aschel, Edward, Grefenstette

PDF

Open Access 1 Repo

TL;DR

This paper introduces Graph Backup, a novel method that models environment transitions as a graph to improve value estimation in data-efficient reinforcement learning, outperforming existing multi-step methods.

Contribution

It proposes Graph Backup, a new graph-based backup operator that leverages transition structure for better value estimation in data-efficient RL.

Findings

01

Graph Backup outperforms traditional multi-step methods on benchmark tasks.

02

The method provides stable value estimates regardless of trajectory sampling.

03

Visualization of Atari transition graphs explains performance improvements.

Abstract

The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stream of online experiences, but applying RL in the data-efficient setting with limited access to online interactions is still challenging. A key to data-efficient RL is good value estimation, but current methods in this space fail to fully utilise the structure of the trajectory data gathered from the environment. In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation. Compared to multi-step backup methods such as $n$ -step $Q$ -Learning and TD( $λ$ ), Graph Backup can perform counterfactual credit assignment and gives stable value estimates for a state regardless of which trajectory the state is sampled from. Our method, when combined with popular value-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhengyaoJiang/graphbackup
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics