Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap
Nikolai Karpov, Chen Wang

TL;DR
This paper establishes nearly tight bounds on the number of passes needed for exploration in streaming multi-armed bandits with known optimality gap, showing that $ heta( ext{log} n)$ passes are necessary and sufficient.
Contribution
It provides the first tight bounds on pass complexity for streaming multi-armed bandits with known gaps, including a lower bound and a nearly matching algorithm.
Findings
Any algorithm with sublinear memory requires at least $rac{ ext{log} n}{ ext{log} ext{log} n}$ passes.
A nearly matching algorithm achieves the optimal sample complexity with a single arm memory.
The results clarify the trade-offs between passes, memory, and sample complexity in streaming bandit exploration.
Abstract
We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap . Here, and throughout, the optimality gap is defined as the mean reward gap between the best and the -th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known , a pass complexity of (up to terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of with a single-arm memory. However, our understanding of multi-pass algorithms with known is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., $O(…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Smart Grid Energy Management
