A study of Thompson Sampling with Parameter h
Qiang Ha

TL;DR
This paper investigates a modified Thompson Sampling algorithm with a parameter h that adjusts the importance of the current best arm, demonstrating its robustness in two-armed bandit problems.
Contribution
It introduces a parameterized version of Thompson Sampling and analyzes its robustness, extending understanding of its optimality under perturbations.
Findings
Optimality of Thompson Sampling is robust within a range of h values.
The modified algorithm maintains performance in two-armed bandit scenarios.
Parameter h influences the probability weighting in the sampling strategy.
Abstract
Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms
