Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement   Learning: Adaptivity and Computational Efficiency

Heyang Zhao; Jiafan He; Dongruo Zhou; Tong Zhang; Quanquan; Gu

arXiv:2302.10371·cs.LG·February 22, 2023·1 cites

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan, Gu

PDF

Open Access

TL;DR

None

Contribution

None

Abstract

Recently, several studies (Zhou et al., 2021a; Zhang et al., 2021b; Kim et al., 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the regret for the worst-case regime and the deterministic reward regime. However, these algorithms are either computationally intractable or unable to handle unknown variance of the noise. In this paper, we present a novel solution to this open problem by proposing the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $\tilde{O} (d \sum_{k = 1}^{K} σ_{k}^{2} + d)$ regret, where $σ_{k}^{2}$ is the variance of the noise at the round $k$ , $d$ is the dimension of the contexts and $K$ is the total number of rounds. Our results are based on an adaptive variance-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management