Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
Yeoneung Kim, Insoon Yang, Kwang-Sung Jun

TL;DR
This paper improves regret bounds for variance-adaptive linear bandits and horizon-free linear mixture MDPs by introducing novel analyses that significantly reduce the dependence on problem dimensions and time horizon.
Contribution
The paper presents new analyses that substantially tighten regret bounds for variance-adaptive linear bandits and linear mixture MDPs, leveraging a novel peeling-based approach.
Findings
Achieves d√K + d^{1.5}√(∑σ_k^2) + d^2 regret bound for linear bandits.
Attains a horizon-free regret of d√K + d^2 for linear mixture MDPs.
Provides a factor of d^3 and d^{3.5} improvements over previous bounds.
Abstract
In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve where is the dimension of the features, is the time horizon, and is the noise variance at time step , and ignores polylogarithmic dependence, which is a factor of improvement. For linear mixture MDPs with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
