Multi-Armed Sampling Problem and the End of Exploration
Mohammad Pedramfar, Siamak Ravanbakhsh

TL;DR
This paper introduces multi-armed sampling as a framework to analyze exploration in sampling tasks, revealing that sampling requires minimal exploration compared to optimization, with implications for reinforcement learning and neural sampling.
Contribution
It establishes a formal framework for multi-armed sampling, defines regret notions, provides near-optimal algorithms, and connects sampling with bandit problems through a unifying temperature parameter.
Findings
Sampling requires little to no exploration for near-optimal performance.
The framework unifies sampling and bandit problems via a temperature parameter.
Results have implications for entropy-regularized reinforcement learning and neural samplers.
Abstract
This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off in the context of sampling. We systematically define plausible notions of regret for this framework and establish corresponding lower bounds. We then propose a simple algorithm that achieves near-optimal regret bounds. Our theoretical results suggest that, in contrast to optimization, sampling barely requires any exploration. To further connect our findings with those of multi-armed bandits, we define a continuous family of problems and associated regret measures that smoothly interpolate and unify multi-armed sampling and multi-armed bandit problems using a temperature parameter. We believe that the multi-armed sampling framework and our findings in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
