Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems
Sondre Glimsdal, Ole-Christoffer Granmo

TL;DR
This paper introduces a Thompson Sampling-based method for solving stochastic point location and root-finding problems in deceptive environments, effectively balancing exploration and exploitation even with erroneous feedback.
Contribution
The paper proposes a novel Bayesian approach with Thompson Sampling for the SPL problem, capable of handling deceptive feedback and improving over existing algorithms.
Findings
Outperforms competing algorithms in deceptive environments
Successfully solves stochastic root-finding problems with erroneous feedback
Provides a scalable Bayesian framework for continuous action spaces
Abstract
The multi-armed bandit problem forms the foundation for solving a wide range of on-line stochastic optimization problems through a simple, yet effective mechanism. One simply casts the problem as a gambler that repeatedly pulls one out of N slot machine arms, eliciting random rewards. Learning of reward probabilities is then combined with reward maximization, by carefully balancing reward exploration against reward exploitation. In this paper, we address a particularly intriguing variant of the multi-armed bandit problem, referred to as the {\it Stochastic Point Location (SPL) Problem}. The gambler is here only told whether the optimal arm (point) lies to the "left" or to the "right" of the arm pulled, with the feedback being erroneous with probability . This formulation thus captures optimization in continuous action spaces with both {\it informative} and {\it deceptive}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
