Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent

Budhaditya Halder; Ishan Sengupta; Koustav Chowdhury; Koulik Khamaru

arXiv:2603.10184·stat.ML·March 12, 2026

Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent

Budhaditya Halder, Ishan Sengupta, Koustav Chowdhury, Koulik Khamaru

PDF

Open Access

TL;DR

This paper develops a stability-based framework for bandit algorithms using stochastic mirror descent, enabling valid inference, optimal learning, and robustness to corruption.

Contribution

It introduces a stability criterion for bandit algorithms, designs regularized-EXP3 algorithms satisfying this criterion, and proves their robustness and optimality.

Findings

01

Stability criterion ensures valid inference in bandit algorithms.

02

Regularized-EXP3 achieves minimax-optimal regret and valid confidence intervals.

03

Modified algorithms maintain asymptotic normality under adversarial corruption.

Abstract

Statistical inference with bandit data presents fundamental challenges due to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability as a sufficient condition for valid inference under adaptivity. This paper develops a systematic theory of stability for bandit algorithms based on stochastic mirror descent, a broad algorithmic framework that includes the widely-used EXP3 algorithm as a special case. Our contributions are threefold. First, we establish a general stability criterion: if the average iterates of a stochastic mirror descent algorithm converge in ratio to a non-random probability vector, then the induced bandit algorithm is stable. This result provides a unified lens for analyzing stability across diverse algorithmic instantiations. Second, we introduce a family of regularized-EXP3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference