Dynamic Regret of Online Markov Decision Processes

Peng Zhao; Long-Fei Li; Zhi-Hua Zhou

arXiv:2208.12483·cs.LG·August 29, 2022

Dynamic Regret of Online Markov Decision Processes

Peng Zhao, Long-Fei Li, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper studies the performance of online Markov Decision Processes under adversarial conditions, proposing algorithms with optimal dynamic regret guarantees for various models, and explores the limits of predictability in these settings.

Contribution

It introduces novel online ensemble algorithms for different MDP models with proven minimax optimal dynamic regret bounds, and analyzes the impact of environment predictability.

Findings

01

Minimax optimal dynamic regret for episodic SSP.

02

Improved regret bounds in predictable environments.

03

Impossibility results for infinite-horizon MDPs.

Abstract

We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies. The measure is strictly stronger than the standard static regret that benchmarks the learner's performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For these three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure. Furthermore, when the online environments encountered by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Age of Information Optimization · Smart Grid Energy Management