A PAC RL Algorithm for Episodic POMDPs
Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

TL;DR
This paper introduces a novel PAC RL algorithm for episodic POMDPs that achieves polynomial sample complexity bounds, advancing efficient learning in partially observable environments.
Contribution
It presents the first RL algorithm with polynomial sample complexity for a significant class of episodic POMDPs, leveraging recent latent variable estimation techniques.
Findings
Achieves polynomial bounds on episodes for near-optimal performance
Applicable to an important class of episodic POMDPs
Builds on recent advances in method of moments for latent variable models
Abstract
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in method of moments for latent variable model estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
