Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency
Qi Cai, Zhuoran Yang, Zhaoran Wang

TL;DR
This paper introduces a new reinforcement learning algorithm for partially observed MDPs with linear structure, achieving provable sample efficiency independent of observation and state space sizes.
Contribution
It bridges partial observability and function approximation in POMDPs, providing the first sample-efficient RL algorithm with theoretical guarantees for this setting.
Findings
Achieves $psilon$-optimal policy in $O(1/psilon^2)$ episodes.
Sample complexity scales polynomially with the intrinsic dimension.
Independence from observation and state space sizes.
Abstract
We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial observability and function approximation for a class of POMDPs with a linear structure. In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an -optimal policy within episodes. In particular, the sample complexity scales polynomially in the intrinsic dimension of the linear structure and is independent of the size of the observation and state spaces. The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Reinforcement Learning in Robotics
