Optimistic Regret Bounds for Online Learning in Adversarial Markov   Decision Processes

Sang Bin Moon; Abolfazl Hashemi

arXiv:2405.02188·stat.ML·May 6, 2024

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

Sang Bin Moon, Abolfazl Hashemi

PDF

Open Access

TL;DR

This paper introduces a new variant of Adversarial Markov Decision Processes that uses cost predictors to achieve optimistic regret bounds, improving learning efficiency in non-adversarial, dynamic environments.

Contribution

It develops a novel policy search method with optimistic regret bounds for AMDPs, overcoming limitations of existing importance-weighted estimators and feedback models.

Findings

01

Achieves sublinear regret with high probability

02

Develops a new biased cost estimator leveraging predictors

03

Demonstrates effectiveness through numerical experiments

Abstract

The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Distributed Sensor Networks and Detection Algorithms

MethodsSparse Evolutionary Training