Thompson Sampling with Virtual Helping Agents

Kartik Anand Pant; Amod Hegde; and K. V. Srinivas

arXiv:2209.08197·cs.LG·September 20, 2022·1 cites

Thompson Sampling with Virtual Helping Agents

Kartik Anand Pant, Amod Hegde, and K. V. Srinivas

PDF

Open Access

TL;DR

This paper introduces a flexible framework for tuning exploration and exploitation in Thompson sampling, leading to improved performance in multi-armed bandit problems and related tasks, with theoretical and empirical validation.

Contribution

The authors propose a novel framework for adaptively balancing exploration and exploitation in Thompson sampling, along with two algorithms and extensions for additional bandit problems.

Findings

01

Proposed algorithms outperform standard Thompson sampling in cumulative regret.

02

Framework allows task-specific adjustment of exploration-exploitation trade-off.

03

Empirical results on real-world datasets validate the approach.

Abstract

We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the multi-armed bandit framework. Thompson sampling is one of the heuristics for choosing actions that address this exploration-exploitation dilemma. We first propose a general framework that helps heuristically tune the exploration versus exploitation trade-off in Thompson sampling using multiple samples from the posterior distribution. Utilizing this framework, we propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret. Next, we demonstrate the empirical improvement in the cumulative regret performance of the proposed algorithm over Thompson Sampling. We also show the effectiveness of the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing