TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
Gregory Farquhar, Tim Rockt\"aschel, Maximilian Igl, Shimon Whiteson

TL;DR
This paper introduces TreeQN and ATreeC, differentiable tree-structured models for deep reinforcement learning that improve planning and value estimation in complex environments by learning transition models end-to-end.
Contribution
The paper presents TreeQN and ATreeC, novel differentiable tree-structured models that integrate learned transition models into deep RL, enabling better planning and policy learning.
Findings
TreeQN and ATreeC outperform baseline methods on a box-pushing task.
They also outperform n-step DQN and value prediction networks on Atari games.
Ablation studies show the importance of auxiliary losses for learning transition models.
Abstract
Combining deep model-free reinforcement learning with on-line planning is a promising approach to building on the successes of deep RL. On-line planning with look-ahead trees has proven successful in environments where transition models are known a priori. However, in complex environments where transition models need to be learned from data, the deficiencies of learned models have limited their utility for planning. To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions. TreeQN dynamically constructs a tree by recursively applying a transition model in a learned abstract state space and then aggregating predicted rewards and state-values using a tree backup to estimate Q-values. We also propose ATreeC, an actor-critic variant that augments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsQ-Learning · A2C · Dense Connections · Convolution · Deep Q-Network
