Thompson Sampling on Asymmetric $\alpha$-Stable Bandits

Zhendong Shi; Ercan E. Kuruoglu; Xiaoli Wei

arXiv:2203.10214·stat.ML·March 28, 2022

Thompson Sampling on Asymmetric $\alpha$-Stable Bandits

Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei

PDF

Open Access

TL;DR

This paper explores the application of Thompson Sampling to multi-armed bandits with rewards following unknown asymmetric alpha-stable distributions, focusing on modeling financial and wireless data.

Contribution

It introduces the use of Thompson Sampling for bandits with asymmetric alpha-stable reward distributions, addressing a novel class of problems in reinforcement learning.

Findings

01

Demonstrates the effectiveness of Thompson Sampling in asymmetric alpha-stable bandit models

02

Provides insights into modeling financial and wireless data with alpha-stable distributions

03

Suggests potential applications in finance and wireless communication

Abstract

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric $α$ -stable distributions and explore their applications in modelling financial and wireless data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms