Graph Backup: Data Efficient Backup Exploiting Markovian Transitions
Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rockt\"aschel, Edward, Grefenstette

TL;DR
This paper introduces Graph Backup, a novel method that models environment transitions as a graph to improve value estimation in data-efficient reinforcement learning, outperforming existing multi-step methods.
Contribution
It proposes Graph Backup, a new graph-based backup operator that leverages transition structure for better value estimation in data-efficient RL.
Findings
Graph Backup outperforms traditional multi-step methods on benchmark tasks.
The method provides stable value estimates regardless of trajectory sampling.
Visualization of Atari transition graphs explains performance improvements.
Abstract
The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stream of online experiences, but applying RL in the data-efficient setting with limited access to online interactions is still challenging. A key to data-efficient RL is good value estimation, but current methods in this space fail to fully utilise the structure of the trajectory data gathered from the environment. In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation. Compared to multi-step backup methods such as -step -Learning and TD(), Graph Backup can perform counterfactual credit assignment and gives stable value estimates for a state regardless of which trajectory the state is sampled from. Our method, when combined with popular value-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
