Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback
Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, Julian Zimmert

TL;DR
This paper advances regret minimization in low-rank MDPs with unknown transitions under both full-information and bandit feedback, introducing improved bounds and new algorithms, including computationally efficient ones, for challenging settings.
Contribution
It improves regret bounds for full-information unknown transition MDPs and introduces the first algorithms for bandit feedback with unknown transitions, including oracle-efficient methods.
Findings
Improved regret bound to $poly(d, A, H)T^{2/3}$ for full-information setting.
Proposed model-based and model-free algorithms for bandit feedback with unknown transitions.
Established the necessity of linear structure for regret bounds without state number dependence.
Abstract
We consider regret minimization in low-rank MDPs with fixed transition and adversarial losses. Previous work has investigated this problem under either full-information loss feedback with unknown transitions (Zhao et al., 2024), or bandit loss feedback with known transition (Foster et al., 2022). First, we improve the regret bound of Zhao et al. (2024) to for the full-information unknown transition setting, where d is the rank of the transitions, A is the number of actions, H is the horizon length, and T is the number of episodes. Next, we initiate the study on the setting with bandit loss feedback and unknown transitions. Assuming that the loss has a linear structure, we propose both model based and model free algorithms achieving regret, though they are computationally inefficient. We also propose oracle-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Cryptographic Implementations and Security
