Predictive State Temporal Difference Learning
Byron Boots, Geoffrey J. Gordon

TL;DR
This paper introduces PSTD, a novel method combining reinforcement learning and subspace identification to improve value function approximation by selecting informative feature subsets.
Contribution
The paper presents PSTD, a new algorithm that effectively compresses features while preserving predictive information for reinforcement learning.
Findings
PSTD is statistically consistent.
PSTD outperforms traditional methods on complex tasks.
PSTD effectively reduces feature dimensionality.
Abstract
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
