RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

Tong Li; Thiago de Queiroz Casanova; Eric M. Schwartz; Victor Kostyuk; Dehan Kong; and Joseph J. Williams

arXiv:2603.11276·stat.ML·May 19, 2026

RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

Tong Li, Thiago de Queiroz Casanova, Eric M. Schwartz, Victor Kostyuk, Dehan Kong, and Joseph J. Williams

PDF

TL;DR

This paper introduces RIE-Greedy, a novel exploration strategy for contextual bandits that leverages the randomness from regularization in model training, providing theoretical and empirical benefits over traditional methods.

Contribution

It demonstrates that regularization-induced stochasticity can serve as an effective exploration mechanism, bridging theory and practice in contextual bandit algorithms.

Findings

01

Regularization-induced exploration is theoretically equivalent to Thompson Sampling in two-armed bandits.

02

Empirically, RIE-Greedy outperforms epsilon-greedy and other methods in large-scale business environments.

Abstract

Real-world contextual bandit problems with complex reward models are often tackled with iteratively trained models, such as boosting trees. However, it is difficult to directly apply simple and effective exploration strategies--such as Thompson Sampling or UCB--on top of those black-box estimators. Existing approaches rely on sophisticated assumptions or intractable procedures that are hard to verify and implement in practice. In this work, we explore the use of an exploration-free (pure-greedy) action selection strategy, that exploits the randomness inherent in model fitting process as an intrinsic source of exploration. More specifically, we note that the stochasticity in cross-validation based regularization process can naturally induce Thompson Sampling-like exploration. We show that this regularization-induced exploration is theoretically equivalent to Thompson Sampling in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI) · Advanced Causal Inference Techniques