A PAC RL Algorithm for Episodic POMDPs

Zhaohan Daniel Guo; Shayan Doroudi; Emma Brunskill

arXiv:1605.08062·cs.LG·June 2, 2016·2 cites

A PAC RL Algorithm for Episodic POMDPs

Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

PDF

Open Access

TL;DR

This paper introduces a novel PAC RL algorithm for episodic POMDPs that achieves polynomial sample complexity bounds, advancing efficient learning in partially observable environments.

Contribution

It presents the first RL algorithm with polynomial sample complexity for a significant class of episodic POMDPs, leveraging recent latent variable estimation techniques.

Findings

01

Achieves polynomial bounds on episodes for near-optimal performance

02

Applicable to an important class of episodic POMDPs

03

Builds on recent advances in method of moments for latent variable models

Abstract

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in method of moments for latent variable model estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research