Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit   Feedback

Haolin Liu; Zakaria Mhammedi; Chen-Yu Wei; Julian Zimmert

arXiv:2411.06739·cs.LG·November 12, 2024

Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback

Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, Julian Zimmert

PDF

Open Access

TL;DR

This paper advances regret minimization in low-rank MDPs with unknown transitions under both full-information and bandit feedback, introducing improved bounds and new algorithms, including computationally efficient ones, for challenging settings.

Contribution

It improves regret bounds for full-information unknown transition MDPs and introduces the first algorithms for bandit feedback with unknown transitions, including oracle-efficient methods.

Findings

01

Improved regret bound to $poly(d, A, H)T^{2/3}$ for full-information setting.

02

Proposed model-based and model-free algorithms for bandit feedback with unknown transitions.

03

Established the necessity of linear structure for regret bounds without state number dependence.

Abstract

We consider regret minimization in low-rank MDPs with fixed transition and adversarial losses. Previous work has investigated this problem under either full-information loss feedback with unknown transitions (Zhao et al., 2024), or bandit loss feedback with known transition (Foster et al., 2022). First, we improve the $p o l y (d, A, H) T^{5/6}$ regret bound of Zhao et al. (2024) to $p o l y (d, A, H) T^{2/3}$ for the full-information unknown transition setting, where d is the rank of the transitions, A is the number of actions, H is the horizon length, and T is the number of episodes. Next, we initiate the study on the setting with bandit loss feedback and unknown transitions. Assuming that the loss has a linear structure, we propose both model based and model free algorithms achieving $p o l y (d, A, H) T^{2/3}$ regret, though they are computationally inefficient. We also propose oracle-efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Cryptographic Implementations and Security