Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou, Zihan Zhang, Simon S. Du

TL;DR
This paper introduces variance-dependent regret bounds for MDPs that adapt to environment variance, achieving optimal performance in both stochastic and deterministic settings through new algorithms and analysis techniques.
Contribution
It proposes environment norms for variance characterization, develops a variance-dependent MVP variant, and initiates variance-aware model-free algorithms with matching lower bounds.
Findings
Variance-dependent bounds are minimax optimal for stochastic and deterministic MDPs.
New environment norms effectively capture environment variance properties.
Designed algorithms outperform existing methods in variance-adaptive regret.
Abstract
We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic MDPs). The existing algorithms are either variance-independent or suboptimal. We first propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm (Zhang et al., 2021a). We apply new analysis techniques to demonstrate that this algorithm enjoys variance-dependent bounds with respect to the norms we propose. In particular, this bound is simultaneously minimax optimal for both stochastic and deterministic MDPs, the first result of its kind. We further initiate the study on model-free algorithms with variance-dependent regret bounds by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
