Best-of-Both-Worlds for Heavy-Tailed Markov Decision Processes
Yu Chen, Yuhao Liu, Jiatai Huang, Yihan Du, Longbo Huang

TL;DR
This paper introduces new algorithms for heavy-tailed Markov Decision Processes that adaptively perform well in both adversarial and stochastic environments, achieving optimal regret bounds.
Contribution
The paper proposes the HT-FTRL-OM and HT-FTRL-UOB algorithms that attain Best-of-Both-Worlds guarantees for heavy-tailed MDPs, including unknown transition settings.
Findings
Achieves $ ilde{O}(T^{1/eta})$ regret in adversarial regimes.
Achieves $O( ext{log} T)$ regret in stochastic regimes.
Develops novel estimators and analysis techniques for heavy-tailed, adversarial, and stochastic environments.
Abstract
We investigate episodic Markov Decision Processes with heavy-tailed losses (HTMDPs). Existing approaches for HTMDPs are conservative in stochastic environments and lack adaptivity in adversarial regimes. In this work, we propose algorithms HT-FTRL-OM and HT-FTRL-UOB for HTMDPs that achieve Best-of-Both-Worlds (BoBW) guarantees: instance-independent regret in adversarial environments and logarithmic instance-dependent regret in self-bounding (including the stochastic case) environments. For the known transition setting, HT-FTRL-OM applies the Follow-The-Regularized-Leader (FTRL) framework over occupancy measures with novel skipping loss estimators, achieving a regret bound in adversarial regimes and a regret in stochastic regimes. Building upon this framework, we develop a novel algorithm HT-FTRL-UOB to tackle the more challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
