Tight Memory-Regret Lower Bounds for Streaming Bandits
Shaoang Li, Lan Zhang, Junhao Wang, Xiang-Yang Li

TL;DR
This paper establishes tight lower bounds on regret for streaming bandits with limited memory, revealing fundamental limits and differences from classical bandit settings, and proposes an algorithm matching these bounds.
Contribution
It introduces the first tight regret lower bounds for streaming bandits with sublinear memory and provides an algorithm that nearly matches these bounds.
Findings
Lower bound of (TB)^{^{B}/(2^{B+1}-1)} K^{1-^{B}/(2^{B+1}-1)} for streaming bandits.
An unavoidable double logarithmic factor compared to classical lower bound.
A first instance-dependent lower bound of T^{1/(B+1)} for streaming bandits.
Abstract
In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of for any algorithm with a time horizon , number of arms , and number of passes . The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
