Thompson Sampling on Asymmetric $\alpha$-Stable Bandits
Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei

TL;DR
This paper explores the application of Thompson Sampling to multi-armed bandits with rewards following unknown asymmetric alpha-stable distributions, focusing on modeling financial and wireless data.
Contribution
It introduces the use of Thompson Sampling for bandits with asymmetric alpha-stable reward distributions, addressing a novel class of problems in reinforcement learning.
Findings
Demonstrates the effectiveness of Thompson Sampling in asymmetric alpha-stable bandit models
Provides insights into modeling financial and wireless data with alpha-stable distributions
Suggests potential applications in finance and wireless communication
Abstract
In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric -stable distributions and explore their applications in modelling financial and wireless data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
