Horizon-Free and Variance-Dependent Reinforcement Learning for Latent   Markov Decision Processes

Runlong Zhou; Ruosong Wang; Simon S. Du

arXiv:2210.11604·cs.LG·May 23, 2023

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

Runlong Zhou, Ruosong Wang, Simon S. Du

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel reinforcement learning algorithm for Latent Markov Decision Processes that achieves nearly horizon-free regret bounds, with a focus on variance-dependent analysis and minimax optimality.

Contribution

The paper presents the first nearly horizon-free regret bounds for LMDPs, along with a variance-dependent analysis and a new lower bound demonstrating minimax optimality.

Findings

01

Achieves $ ilde{O}( ext{sqrt}( ext{Var}^ ext{star} M ext{Gamma} S A K))$ regret bound.

02

First problem-dependent regret bound for LMDPs.

03

Provides a novel $ ext{Omega}( ext{sqrt}( ext{Var}^ ext{star} M S A K))$ lower bound.

Abstract

We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver. We prove an $\tilde{O} (Var^{⋆} M Γ S A K)$ regret bound where $\tilde{O}$ hides logarithm factors, $M$ is the number of contexts, $S$ is the number of states, $A$ is the number of actions, $K$ is the number of episodes, $Γ \leq S$ is the maximum transition degree of any state-action pair, and $Var^{⋆}$ is a variance quantity describing the determinism of the LMDP. The regret bound only scales logarithmically with the planning horizon, thus yielding the first (nearly) horizon-free regret bound for LMDP. This is also the first problem-dependent regret bound for LMDP. Key in our proof is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics