Information-directed sampling for bandits: a primer

Annika Hirling; Giorgio Nicoletti; Antonio Celani

arXiv:2512.20096·cs.LG·December 24, 2025

Information-directed sampling for bandits: a primer

Annika Hirling, Giorgio Nicoletti, Antonio Celani

PDF

Open Access

TL;DR

This paper reviews Information Directed Sampling (IDS) policies for multi-armed bandits, analyzing their performance in simple models, extending to infinite horizons, and demonstrating bounded or logarithmic regret in key scenarios.

Contribution

It introduces a modified IDS framework for infinite-horizon bandits and compares heuristic strategies against optimal policies in minimal models.

Findings

01

IDS achieves bounded regret in symmetric bandits

02

In the one-fair-coin case, IDS regret scales logarithmically with horizon

03

The paper bridges reinforcement learning and information theory concepts

Abstract

The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics that balance immediate regret against information gain. We focus on the tractable environment of two-state Bernoulli bandits as a minimal model to rigorously compare heuristic strategies against the optimal policy. We extend the IDS framework to the discounted infinite-horizon setting by introducing a modified information measure and a tuning parameter to modulate the decision-making behavior. We examine two specific problem classes: symmetric bandits and the scenario involving one fair coin. In the symmetric case we show that IDS achieves bounded cumulative regret, whereas in the one-fair-coin scenario the IDS policy yields a regret that scales…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Stochastic Gradient Optimization Techniques