RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy   Evaluation

Jeongyeol Kwon; Shie Mannor; Constantine Caramanis; Yonathan Efroni

arXiv:2406.01389·cs.LG·June 27, 2024

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni

PDF

Open Access

TL;DR

This paper presents the first sample-efficient algorithm for Latent Markov Decision Processes (LMDPs) that does not rely on structural assumptions, using novel off-policy evaluation techniques to achieve near-optimal guarantees.

Contribution

It introduces a new off-policy evaluation lemma and coverage coefficient for LMDPs, enabling provably efficient exploration without structural assumptions.

Findings

01

First sample-efficient algorithm for general LMDPs

02

Establishes a new off-policy evaluation lemma

03

Achieves near-optimal exploration guarantees

Abstract

In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound (Kwon et al., 2021). We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications