TL;DR
This paper introduces feature perturbation, a novel exploration method for contextual bandits that injects randomness into features, achieving strong theoretical regret bounds and superior empirical performance while being computationally efficient.
Contribution
The paper presents a new feature perturbation strategy for exploration in contextual bandits, providing near-optimal regret bounds and extending naturally to complex models.
Findings
Achieves $ ilde{O}(d\,\sqrt{T})$ regret bound for generalized linear bandits.
Outperforms existing exploration methods in empirical evaluations.
Efficient and adaptable to non-parametric and neural network models.
Abstract
We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves worst-case regret bound for generalized linear contextual bandits, while avoiding the regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
