Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret   Analysis

Hao Li; Dong Liang; Zheng Xie

arXiv:2409.06329·stat.ML·September 12, 2024

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Hao Li, Dong Liang, Zheng Xie

PDF

Open Access

TL;DR

This paper proposes Meta-TSLB, a modified meta-Thompson sampling algorithm for linear contextual bandits, providing theoretical regret bounds and demonstrating its adaptability and generalization in experiments.

Contribution

It introduces Meta-TSLB, extending meta-Thompson sampling to linear bandits, with theoretical analysis and empirical validation of its effectiveness and adaptability.

Findings

01

Achieves an $O((m+\log(m))\sqrt{n ext{log}(n)})$ Bayes regret bound.

02

Demonstrates strong generalization to unseen bandit instances.

03

Shows competitive performance in experimental evaluations.

Abstract

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior distribution sampled from a meta-prior by interacting with bandit instances drawn from it. However, its analysis was limited to Gaussian bandit. The contextual multi-armed bandit framework is an extension of the Gaussian Bandit, which challenges agent to utilize context vectors to predict the most valuable arms, optimally balancing exploration and exploitation to minimize regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an $O ((m + lo g (m)) n lo g (n))$ bound on its Bayes regret, in which $m$ represents the number of bandit instances, and $n$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Speech and Audio Processing