Temporal-Difference Networks
Richard S. Sutton, Brian Tanner

TL;DR
TD networks extend traditional TD learning by relating multiple interdependent predictions, enabling learning of complex, multi-step, and non-Markovian predictions, thus broadening the scope of predictive modeling in reinforcement learning.
Contribution
Introduction of TD networks that relate predictions to each other, allowing for more complex and non-Markovian predictions beyond conventional TD methods.
Findings
TD networks can learn to predict by fixed intervals.
Conditional inter-predictive relationships improve learning efficiency.
TD networks can learn predictive state representations for non-Markov problems.
Abstract
We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possible with conventional TD methods. Secondly, we show that if the inter-predictive relationships are made conditional on action, then the usual learning-efficiency advantage of TD methods over Monte Carlo (supervised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
