Thompson Sampling Guided Stochastic Searching on the Line for Deceptive   Environments with Applications to Root-Finding Problems

Sondre Glimsdal; Ole-Christoffer Granmo

arXiv:1708.01791·cs.AI·August 8, 2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

Sondre Glimsdal, Ole-Christoffer Granmo

PDF

Open Access

TL;DR

This paper introduces a Thompson Sampling-based method for solving stochastic point location and root-finding problems in deceptive environments, effectively balancing exploration and exploitation even with erroneous feedback.

Contribution

The paper proposes a novel Bayesian approach with Thompson Sampling for the SPL problem, capable of handling deceptive feedback and improving over existing algorithms.

Findings

01

Outperforms competing algorithms in deceptive environments

02

Successfully solves stochastic root-finding problems with erroneous feedback

03

Provides a scalable Bayesian framework for continuous action spaces

Abstract

The multi-armed bandit problem forms the foundation for solving a wide range of on-line stochastic optimization problems through a simple, yet effective mechanism. One simply casts the problem as a gambler that repeatedly pulls one out of N slot machine arms, eliciting random rewards. Learning of reward probabilities is then combined with reward maximization, by carefully balancing reward exploration against reward exploitation. In this paper, we address a particularly intriguing variant of the multi-armed bandit problem, referred to as the {\it Stochastic Point Location (SPL) Problem}. The gambler is here only told whether the optimal arm (point) lies to the "left" or to the "right" of the arm pulled, with the feedback being erroneous with probability $1 - π$ . This formulation thus captures optimization in continuous action spaces with both {\it informative} and {\it deceptive}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms