Adaptive Data Augmentation for Thompson Sampling

Wonyoung Kim

arXiv:2506.14479·stat.ML·June 18, 2025

Adaptive Data Augmentation for Thompson Sampling

Wonyoung Kim

PDF

Open Access

TL;DR

This paper introduces an adaptive data augmentation technique for Thompson Sampling in linear contextual bandits, achieving near-optimal regret bounds and improved empirical performance without strong distributional assumptions.

Contribution

It develops a novel estimator with adaptive augmentation and coupling of hypothetical samples, enhancing parameter learning and reward prediction in linear bandits.

Findings

01

Achieves nearly minimax optimal regret bounds.

02

Demonstrates robust empirical performance improvements.

03

Does not rely on assumptions about context distribution.

Abstract

In linear contextual bandits, the objective is to select actions that maximize cumulative rewards, modeled as a linear function with unknown parameters. Although Thompson Sampling performs well empirically, it does not achieve optimal regret bounds. This paper proposes a nearly minimax optimal Thompson Sampling for linear contextual bandits by developing a novel estimator with the adaptive augmentation and coupling of the hypothetical samples that are designed for efficient parameter learning. The proposed estimator accurately predicts rewards for all arms without relying on assumptions for the context distribution. Empirical results show robust performance and significant improvement over existing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms