Computationally Efficient PAC RL in POMDPs with Latent Determinism and   Conditional Embeddings

Masatoshi Uehara; Ayush Sekhari; Jason D. Lee; Nathan Kallus; Wen Sun

arXiv:2206.12081·cs.LG·June 27, 2022

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

PDF

Open Access

TL;DR

This paper introduces a computationally efficient reinforcement learning algorithm for large-scale POMDPs with deterministic latent transitions, leveraging Hilbert space embeddings and function approximation to find the exact optimal policy.

Contribution

It develops a polynomial-time, statistically efficient algorithm for POMDPs with deterministic latent states and conditional embeddings, achieving exact optimal policies without dependence on state or observation space size.

Findings

01

Algorithm scales polynomially with horizon and feature dimension.

02

Requires deterministic latent transitions and action gap assumptions.

03

Guarantees exact optimal policy in large-scale POMDPs.

Abstract

We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emission process, and the latent state transition is deterministic. Under the function approximation setup where the optimal latent state-action $Q$ -function is linear in the state feature, and the optimal $Q$ -function has a gap in actions, we provide a \emph{computationally and statistically efficient} algorithm for finding the \emph{exact optimal} policy. We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics