Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits
Yuchen He, Zichun Ye, Chihao Zhang

TL;DR
This paper characterizes the optimal regret for streaming stochastic multi-armed bandits with memory constraints, providing algorithms and bounds that depend on the number of passes, arms, and memory size.
Contribution
It offers a complete characterization of the optimal regret in the streaming multi-armed bandit problem with memory limitations, including matching upper and lower bounds.
Findings
Designed an algorithm with specific regret bounds.
Proved a matching lower bound for the regret.
Results are tight up to a logarithmic factor.
Abstract
We study the stochastic multi-armed bandit problem in the -pass streaming model. In this problem, the arms are present in a stream and at most arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of and . Specifically, we design an algorithm with regret and complement it with an lower bound when the number of rounds is sufficiently large. Our results are tight up to a logarithmic factor in and .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
