PAC Reinforcement Learning for Predictive State Representations
Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

TL;DR
This paper introduces a novel model-based reinforcement learning algorithm for Predictive State Representations (PSRs) that achieves polynomial sample complexity, enabling near-optimal policy learning in partially observable systems with large state spaces.
Contribution
It develops the first polynomial sample complexity algorithm for PSRs that works with function approximation and applies to various POMDP subclasses.
Findings
Algorithm learns near-optimal policies with polynomial sample complexity.
Sample complexity scales polynomially with model class complexity, not state space size.
Applicable to multiple POMDP variants, including low-rank and linear models.
Abstract
In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as Partially Observable Markov Decision Processes (POMDP). PSR represents the states using a set of predictions of future observations and is defined entirely using observable quantities. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scaling polynomially with respect to all the relevant parameters of the systems. Our algorithm naturally works with function approximation to extend to systems with potentially large state and observation spaces. We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Algorithms
