Analysis of Search Heuristics in the Multi-Armed Bandit Setting
Jasmin Brandt, Barbara Hammer, Timo K\"otzing, Jurek Sander

TL;DR
This paper analyzes how different search heuristics perform in the Multi-Armed Bandit setting, focusing on their ability to identify the Condorcet winner through dueling comparisons.
Contribution
It demonstrates the limitations of evolutionary algorithms in finding the Condorcet winner and proposes a simple EDA that performs better in this task.
Findings
Evolutionary algorithms rarely identify the Condorcet winner in stationary distribution.
A Max-Min Ant System-based EDA effectively maintains the Condorcet winner with high probability.
Repeated duels can improve the Condorcet winner probability for (1+1) EA.
Abstract
We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than . We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability , the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
