Neural Thompson Sampling

Weitong Zhang; Dongruo Zhou; Lihong Li; Quanquan Gu

arXiv:2010.00827·cs.LG·January 3, 2022·26 cites

Neural Thompson Sampling

Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

PDF

Open Access 2 Repos 1 Video

TL;DR

Neural Thompson Sampling introduces a neural network-based approach for contextual bandits, providing theoretical regret guarantees and demonstrating superior performance through experiments on various datasets.

Contribution

It develops a novel neural network-based posterior distribution for Thompson Sampling, combining deep learning with regret guarantees in contextual bandit problems.

Findings

01

Achieves a cumulative regret of O(T^{1/2}) under bounded reward functions.

02

Experimental results outperform benchmark bandit algorithms on multiple datasets.

03

Theoretical analysis confirms the regret bound for the proposed method.

Abstract

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation. At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network. We prove that, provided the underlying reward function is bounded, the proposed algorithm is guaranteed to achieve a cumulative regret of $O (T^{1/2})$ , which matches the regret of other contextual bandit algorithms in terms of total round number $T$ . Experimental comparisons with other benchmark bandit algorithms on various data sets corroborate our theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Neural Thompson Sampling· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms