Value Prediction Network
Junhyuk Oh, Satinder Singh, Honglak Lee

TL;DR
The paper introduces Value Prediction Network (VPN), a deep RL architecture that combines model-free and model-based approaches, predicting future values conditioned on options, and shows improved performance in stochastic environments and Atari games.
Contribution
VPN uniquely learns a dynamics model focused on predicting future values conditioned on options, blending model-based and model-free RL into a single neural network.
Findings
VPN outperforms baselines in stochastic environments requiring planning.
VPN surpasses DQN on several Atari games with short-lookahead planning.
VPN demonstrates advantages of integrated model-based and model-free RL methods.
Abstract
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Data Stream Mining Techniques
