Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
Baihan Lin

TL;DR
This paper introduces Genetic Thompson Sampling, a novel bandit algorithm that integrates genetic algorithms to enhance decision-making in nonstationary environments, demonstrated through simulations and epidemic control applications.
Contribution
It presents the first integration of evolutionary principles into multi-armed bandit algorithms, specifically through Genetic Thompson Sampling, improving performance in dynamic settings.
Findings
Outperforms baseline algorithms in nonstationary environments
Effective in epidemic control simulation
Provides an interactive visualization tool for learning process
Abstract
As two popular schools of machine learning, online learning and evolutionary computations have become two important driving forces behind real-world decision making engines for applications in biomedicine, economics, and engineering fields. Although there are prior work that utilizes bandits to improve evolutionary algorithms' optimization process, it remains a field of blank on how evolutionary approach can help improve the sequential decision making tasks of online learning agents such as the multi-armed bandits. In this work, we propose the Genetic Thompson Sampling, a bandit algorithm that keeps a population of agents and update them with genetic principles such as elite selection, crossover and mutations. Empirical results in multi-armed bandit simulation environments and a practical epidemic control problem suggest that by incorporating the genetic algorithm into the bandit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques
