Thompson Sampling for Unsupervised Sequential Selection
Arun Verma, Manjesh K. Hanawal, Nandyala Hemachandra

TL;DR
This paper introduces a Thompson Sampling algorithm for the Unsupervised Sequential Selection problem, a variant of multi-armed bandits where feedback is limited, achieving near-optimal regret under certain conditions.
Contribution
It proposes a novel Thompson Sampling approach for USS, demonstrating near-optimal regret and improved numerical performance over existing methods.
Findings
Achieves near-optimal regret in USS under Weak Dominance.
Outperforms existing algorithms in numerical experiments.
Provides theoretical analysis of Thompson Sampling in unsupervised settings.
Abstract
Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learner's goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
