Information-directed sampling for bandits: a primer
Annika Hirling, Giorgio Nicoletti, Antonio Celani

TL;DR
This paper reviews Information Directed Sampling (IDS) policies for multi-armed bandits, analyzing their performance in simple models, extending to infinite horizons, and demonstrating bounded or logarithmic regret in key scenarios.
Contribution
It introduces a modified IDS framework for infinite-horizon bandits and compares heuristic strategies against optimal policies in minimal models.
Findings
IDS achieves bounded regret in symmetric bandits
In the one-fair-coin case, IDS regret scales logarithmically with horizon
The paper bridges reinforcement learning and information theory concepts
Abstract
The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics that balance immediate regret against information gain. We focus on the tractable environment of two-state Bernoulli bandits as a minimal model to rigorously compare heuristic strategies against the optimal policy. We extend the IDS framework to the discounted infinite-horizon setting by introducing a modified information measure and a tuning parameter to modulate the decision-making behavior. We examine two specific problem classes: symmetric bandits and the scenario involving one fair coin. In the symmetric case we show that IDS achieves bounded cumulative regret, whereas in the one-fair-coin scenario the IDS policy yields a regret that scales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Stochastic Gradient Optimization Techniques
