Optimism in Face of a Context: Regret Guarantees for Stochastic   Contextual MDP

Orin Levy; Yishay Mansour

arXiv:2207.11126·cs.LG·January 24, 2023·1 cites

Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

Orin Levy, Yishay Mansour

PDF

Open Access

TL;DR

This paper introduces regret minimization algorithms for stochastic contextual MDPs with various levels of known dynamics, employing an offline regression oracle, and provides theoretical regret bounds and lower bounds, advancing the understanding of optimistic approaches in complex settings.

Contribution

It presents the first optimistic algorithms for contextual MDPs with general function approximation, covering known, unknown, and context-dependent dynamics, with theoretical regret guarantees.

Findings

01

Achieves regret bounds for unknown, context-dependent dynamics.

02

Provides a lower bound on expected regret even with known dynamics.

03

Extends results to CMDPs without minimum reachability, with sublinear regret.

Abstract

We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latter, our algorithm obtains regret bound of $O ((H + 1 / p_{min}) H ∣ S ∣^{3/2} ∣ A ∣ T lo g (max {∣ G ∣, ∣ P ∣} / δ))$ with probability $1 - δ$ , where $P$ and $G$ are finite and realizable function classes used to approximate the dynamics and rewards respectively, $p_{min}$ is the minimum reachability parameter, $S$ is the set of states, $A$ the set of actions, $H$ the horizon, and $T$ the number of episodes. To our knowledge, our approach is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Water resources management and optimization · Advanced Bandit Algorithms Research