Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
Bochao Li, Yao Fu, Wei Chen, Fang Kong

TL;DR
This paper introduces Anchor-TS, a Thompson sampling variant with median-based anchoring, designed to effectively leverage offline data for improved online decision-making under distribution shift.
Contribution
It proposes a novel median-based anchoring rule for Thompson sampling to address distribution shift, with theoretical guarantees and empirical validation.
Findings
Anchor-TS reduces regret compared to baselines.
The median anchoring mitigates bias from distribution shift.
Theoretical analysis quantifies offline data impact.
Abstract
Offline-to-online learning aims to improve online decision-making by leveraging offline logged data. A central challenge in this setting is the distribution shift between offline and online environments. While some existing works attempt to leverage shifted offline data, they largely rely on UCB-type algorithms. Thompson sampling (TS) represents another canonical class of bandit algorithms, well known for its strong empirical performance and naturally suited to offline-to-online learning through its Bayesian formulation. However, unlike UCB indices, posterior samples in TS are not guaranteed to be optimistic with respect to the true arm means. This makes indices constructed from purely online and hybrid data difficult to compare and complicates their use. To address this issue, we propose sample-mean anchored TS (Anchor-TS), which introduces a novel median-based anchoring rule that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
