Eluder-based Regret for Stochastic Contextual MDPs

Orin Levy; Asaf Cassel; Alon Cohen; Yishay Mansour

arXiv:2211.14932·cs.LG·May 30, 2024

Eluder-based Regret for Stochastic Contextual MDPs

Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour

PDF

Open Access

TL;DR

This paper introduces the E-UC$^3$RL algorithm for regret minimization in stochastic contextual MDPs, leveraging Eluder dimension and offline regression oracles to achieve rate-optimal performance under minimal assumptions.

Contribution

It presents the first efficient, rate-optimal regret minimization algorithm for CMDPs using general offline function approximation and extends the Eluder dimension to bounded metrics.

Findings

01

Achieves regret bound of $ ilde{O}(H^3 oot{T |S| |A|} d_E(\

02

First efficient and rate-optimal algorithm for CMDPs with offline function approximation.

03

Extends Eluder dimension to general bounded metrics.

Abstract

We present the E-UC $^{3}$ RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $O (H^{3} T ∣ S ∣∣ A ∣ d_{E} (P) lo g (∣ F ∣∣ P ∣/ δ))),$ with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, $P$ and $F$ are finite function classes used to approximate the context-dependent dynamics and rewards, respectively, and $d_{E} (P)$ is the Eluder dimension of $P$ w.r.t the Hellinger distance. To the best of our knowledge, our algorithm is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Advanced Bandit Algorithms Research