A study of Thompson Sampling with Parameter h

Qiang Ha

arXiv:1710.02174·cs.LG·October 9, 2017

A study of Thompson Sampling with Parameter h

Qiang Ha

PDF

Open Access

TL;DR

This paper investigates a modified Thompson Sampling algorithm with a parameter h that adjusts the importance of the current best arm, demonstrating its robustness in two-armed bandit problems.

Contribution

It introduces a parameterized version of Thompson Sampling and analyzes its robustness, extending understanding of its optimality under perturbations.

Findings

01

Optimality of Thompson Sampling is robust within a range of h values.

02

The modified algorithm maintains performance in two-armed bandit scenarios.

03

Parameter h influences the probability weighting in the sampling strategy.

Abstract

Thompson Sampling algorithm is a well known Bayesian algorithm for solving stochastic multi-armed bandit. At each time step the algorithm chooses each arm with probability proportional to it being the current best arm. We modify the strategy by introducing a paramter h which alters the importance of the probability of an arm being the current best arm. We show that the optimality of Thompson sampling is robust to this perturbation within a range of parameter values for two arm bandits.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms