Refined Regret for Adversarial MDPs with Linear Function Approximation

Yan Dai; Haipeng Luo; Chen-Yu Wei; Julian Zimmert

arXiv:2301.12942·cs.LG·June 5, 2023

Refined Regret for Adversarial MDPs with Linear Function Approximation

Yan Dai, Haipeng Luo, Chen-Yu Wei, Julian Zimmert

PDF

Open Access 1 Video

TL;DR

This paper introduces two algorithms that significantly improve regret bounds for adversarial MDPs with linear function approximation, achieving near-optimal regret in various settings and extending to simulator-free cases.

Contribution

It presents two novel algorithms with refined analysis and loss estimators that improve regret bounds for adversarial MDPs with linear function approximation, including simulator-free scenarios.

Findings

01

Achieves $ ilde{O}( oot K)$ regret with the first algorithm.

02

Develops a magnitude-reduced loss estimator for better regret bounds.

03

Extends to simulator-free linear MDPs with improved regret bounds.

Abstract

We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in some known features, that is, a linear function approximation exists. The best existing regret upper bound for this setting (Luo et al., 2021) is of order $\tilde{O} (K^{2/3})$ (omitting all other dependencies), given access to a simulator. This paper provides two algorithms that improve the regret to $\tilde{O} (K)$ in the same setting. Our first algorithm makes use of a refined analysis of the Follow-the-Regularized-Leader (FTRL) algorithm with the log-barrier regularizer. This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest. Our second algorithm develops a magnitude-reduced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Refined Regret for Adversarial MDPs with Linear Function Approximation· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning