Thompson Sampling with Virtual Helping Agents
Kartik Anand Pant, Amod Hegde, and K. V. Srinivas

TL;DR
This paper introduces a flexible framework for tuning exploration and exploitation in Thompson sampling, leading to improved performance in multi-armed bandit problems and related tasks, with theoretical and empirical validation.
Contribution
The authors propose a novel framework for adaptively balancing exploration and exploitation in Thompson sampling, along with two algorithms and extensions for additional bandit problems.
Findings
Proposed algorithms outperform standard Thompson sampling in cumulative regret.
Framework allows task-specific adjustment of exploration-exploitation trade-off.
Empirical results on real-world datasets validate the approach.
Abstract
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the multi-armed bandit framework. Thompson sampling is one of the heuristics for choosing actions that address this exploration-exploitation dilemma. We first propose a general framework that helps heuristically tune the exploration versus exploitation trade-off in Thompson sampling using multiple samples from the posterior distribution. Utilizing this framework, we propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret. Next, we demonstrate the empirical improvement in the cumulative regret performance of the proposed algorithm over Thompson Sampling. We also show the effectiveness of the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
