Adaptive Data Augmentation for Thompson Sampling
Wonyoung Kim

TL;DR
This paper introduces an adaptive data augmentation technique for Thompson Sampling in linear contextual bandits, achieving near-optimal regret bounds and improved empirical performance without strong distributional assumptions.
Contribution
It develops a novel estimator with adaptive augmentation and coupling of hypothetical samples, enhancing parameter learning and reward prediction in linear bandits.
Findings
Achieves nearly minimax optimal regret bounds.
Demonstrates robust empirical performance improvements.
Does not rely on assumptions about context distribution.
Abstract
In linear contextual bandits, the objective is to select actions that maximize cumulative rewards, modeled as a linear function with unknown parameters. Although Thompson Sampling performs well empirically, it does not achieve optimal regret bounds. This paper proposes a nearly minimax optimal Thompson Sampling for linear contextual bandits by developing a novel estimator with the adaptive augmentation and coupling of the hypothetical samples that are designed for efficient parameter learning. The proposed estimator accurately predicts rewards for all arms without relying on assumptions for the context distribution. Empirical results show robust performance and significant improvement over existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
