Adaptive Exploration in Linear Contextual Bandit
Botao Hao, Tor Lattimore, Csaba Szepesvari

TL;DR
This paper introduces an adaptive algorithm for linear contextual bandits that achieves asymptotic optimality, good finite-time performance, and adapts to well-behaved context distributions, reducing regret significantly.
Contribution
The paper proposes a novel adaptive exploration algorithm for linear contextual bandits that combines asymptotic optimality with practical finite-time performance and automatic detection of favorable conditions.
Findings
Achieves asymptotic optimality in regret.
Performs well in finite-time experiments, outperforming baselines.
Automatically adapts to well-behaved context distributions, reducing regret.
Abstract
Contextual bandits serve as a fundamental model for many sequential decision making tasks. The most popular theoretically justified approaches are based on the optimism principle. While these algorithms can be practical, they are known to be suboptimal asymptotically. On the other hand, existing asymptotically optimal algorithms for this problem do not exploit the linear structure in an optimal way and suffer from lower-order terms that dominate the regret in all practically interesting regimes. We start to bridge the gap by designing an algorithm that is asymptotically optimal and has good finite-time empirical performance. At the same time, we make connections to the recent literature on when exploration-free methods are effective. Indeed, if the distribution of contexts is well behaved, then our algorithm acts mostly greedily and enjoys sub-logarithmic regret. Furthermore, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Smart Grid Energy Management
MethodsAffine Coupling · Normalizing Flows
