Improved Regret Analysis for Variance-Adaptive Linear Bandits and   Horizon-Free Linear Mixture MDPs

Yeoneung Kim; Insoon Yang; Kwang-Sung Jun

arXiv:2111.03289·stat.ML·February 7, 2023

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Yeoneung Kim, Insoon Yang, Kwang-Sung Jun

PDF

Open Access 1 Video

TL;DR

This paper improves regret bounds for variance-adaptive linear bandits and horizon-free linear mixture MDPs by introducing novel analyses that significantly reduce the dependence on problem dimensions and time horizon.

Contribution

The paper presents new analyses that substantially tighten regret bounds for variance-adaptive linear bandits and linear mixture MDPs, leveraging a novel peeling-based approach.

Findings

01

Achieves d√K + d^{1.5}√(∑σ_k^2) + d^2 regret bound for linear bandits.

02

Attains a horizon-free regret of d√K + d^2 for linear mixture MDPs.

03

Provides a factor of d^3 and d^{3.5} improvements over previous bounds.

Abstract

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $\tilde{O} (min {d K, d^{1.5} \sum_{k = 1}^{K} σ_{k}^{2}} + d^{2})$ where $d$ is the dimension of the features, $K$ is the time horizon, and $σ_{k}^{2}$ is the noise variance at time step $k$ , and $\tilde{O}$ ignores polylogarithmic dependence, which is a factor of $d^{3}$ improvement. For linear mixture MDPs with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics