Reinforcement Learning from Partial Observation: Linear Function   Approximation with Provable Sample Efficiency

Qi Cai; Zhuoran Yang; Zhaoran Wang

arXiv:2204.09787·cs.LG·April 2, 2024·5 cites

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

Qi Cai, Zhuoran Yang, Zhaoran Wang

PDF

Open Access

TL;DR

This paper introduces a new reinforcement learning algorithm for partially observed MDPs with linear structure, achieving provable sample efficiency independent of observation and state space sizes.

Contribution

It bridges partial observability and function approximation in POMDPs, providing the first sample-efficient RL algorithm with theoretical guarantees for this setting.

Findings

01

Achieves $psilon$-optimal policy in $O(1/psilon^2)$ episodes.

02

Sample complexity scales polynomially with the intrinsic dimension.

03

Independence from observation and state space sizes.

Abstract

We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial observability and function approximation for a class of POMDPs with a linear structure. In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an $ϵ$ -optimal policy within $O (1/ ϵ^{2})$ episodes. In particular, the sample complexity scales polynomially in the intrinsic dimension of the linear structure and is independent of the size of the observation and state spaces. The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Reinforcement Learning in Robotics