Exploration via Feature Perturbation in Contextual Bandits

Seouh-won Yi; Min-hwan Oh

arXiv:2510.17390·cs.LG·October 27, 2025

Exploration via Feature Perturbation in Contextual Bandits

Seouh-won Yi, Min-hwan Oh

PDF

1 Video

TL;DR

This paper introduces feature perturbation, a novel exploration method for contextual bandits that injects randomness into features, achieving strong theoretical regret bounds and superior empirical performance while being computationally efficient.

Contribution

The paper presents a new feature perturbation strategy for exploration in contextual bandits, providing near-optimal regret bounds and extending naturally to complex models.

Findings

01

Achieves $ ilde{O}(d\,\sqrt{T})$ regret bound for generalized linear bandits.

02

Outperforms existing exploration methods in empirical evaluations.

03

Efficient and adaptable to non-parametric and neural network models.

Abstract

We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\tilde{O} (d T)$ worst-case regret bound for generalized linear contextual bandits, while avoiding the $\tilde{O} (d^{3/2} T)$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploration via Feature Perturbation in Contextual Bandits· slideslive