Best-of-Both-Worlds for Heavy-Tailed Markov Decision Processes

Yu Chen; Yuhao Liu; Jiatai Huang; Yihan Du; Longbo Huang

arXiv:2602.01295·cs.LG·May 18, 2026

Best-of-Both-Worlds for Heavy-Tailed Markov Decision Processes

Yu Chen, Yuhao Liu, Jiatai Huang, Yihan Du, Longbo Huang

PDF

TL;DR

This paper introduces new algorithms for heavy-tailed Markov Decision Processes that adaptively perform well in both adversarial and stochastic environments, achieving optimal regret bounds.

Contribution

The paper proposes the HT-FTRL-OM and HT-FTRL-UOB algorithms that attain Best-of-Both-Worlds guarantees for heavy-tailed MDPs, including unknown transition settings.

Findings

01

Achieves $ ilde{O}(T^{1/eta})$ regret in adversarial regimes.

02

Achieves $O( ext{log} T)$ regret in stochastic regimes.

03

Develops novel estimators and analysis techniques for heavy-tailed, adversarial, and stochastic environments.

Abstract

We investigate episodic Markov Decision Processes with heavy-tailed losses (HTMDPs). Existing approaches for HTMDPs are conservative in stochastic environments and lack adaptivity in adversarial regimes. In this work, we propose algorithms HT-FTRL-OM and HT-FTRL-UOB for HTMDPs that achieve Best-of-Both-Worlds (BoBW) guarantees: instance-independent regret in adversarial environments and logarithmic instance-dependent regret in self-bounding (including the stochastic case) environments. For the known transition setting, HT-FTRL-OM applies the Follow-The-Regularized-Leader (FTRL) framework over occupancy measures with novel skipping loss estimators, achieving a $O (T^{1/ α})$ regret bound in adversarial regimes and a $O (lo g T)$ regret in stochastic regimes. Building upon this framework, we develop a novel algorithm HT-FTRL-UOB to tackle the more challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning