Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both   Worlds in Stochastic and Deterministic Environments

Runlong Zhou; Zihan Zhang; Simon S. Du

arXiv:2301.13446·cs.LG·May 23, 2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Runlong Zhou, Zihan Zhang, Simon S. Du

PDF

Open Access 1 Video

TL;DR

This paper introduces variance-dependent regret bounds for MDPs that adapt to environment variance, achieving optimal performance in both stochastic and deterministic settings through new algorithms and analysis techniques.

Contribution

It proposes environment norms for variance characterization, develops a variance-dependent MVP variant, and initiates variance-aware model-free algorithms with matching lower bounds.

Findings

01

Variance-dependent bounds are minimax optimal for stochastic and deterministic MDPs.

02

New environment norms effectively capture environment variance properties.

03

Designed algorithms outperform existing methods in variance-adaptive regret.

Abstract

We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic MDPs). The existing algorithms are either variance-independent or suboptimal. We first propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm (Zhang et al., 2021a). We apply new analysis techniques to demonstrate that this algorithm enjoys variance-dependent bounds with respect to the norms we propose. In particular, this bound is simultaneously minimax optimal for both stochastic and deterministic MDPs, the first result of its kind. We further initiate the study on model-free algorithms with variance-dependent regret bounds by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management