Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits
Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

TL;DR
This paper investigates the limitations and possibilities of stochastic multi-armed bandit algorithms under simultaneous constraints on memory and batch interactions, revealing fundamental trade-offs and proposing near-optimal algorithms.
Contribution
It establishes lower bounds on batch complexity with memory constraints and provides an algorithm nearly matching these bounds under certain conditions.
Findings
Any algorithm with W-bit memory needs at least Ω(K/W) batches for near-minimax regret.
Logarithmic memory rules out O(K^{1-ε}) batch complexity.
Proposed algorithm achieves regret close to the lower bound with W bits of memory.
Abstract
We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in batches and has only bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret is achievable with bits of memory under fully adaptive interaction, and with a -independent -type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a -bit memory constraint must use at least batches to achieve near-minimax regret , even under adaptive grids. In particular, logarithmic memory rules out batch complexity. Our proof is based on an information bottleneck. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
