Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis
Hao Li, Dong Liang, Zheng Xie

TL;DR
This paper proposes Meta-TSLB, a modified meta-Thompson sampling algorithm for linear contextual bandits, providing theoretical regret bounds and demonstrating its adaptability and generalization in experiments.
Contribution
It introduces Meta-TSLB, extending meta-Thompson sampling to linear bandits, with theoretical analysis and empirical validation of its effectiveness and adaptability.
Findings
Achieves an $O((m+\log(m))\sqrt{n ext{log}(n)})$ Bayes regret bound.
Demonstrates strong generalization to unseen bandit instances.
Shows competitive performance in experimental evaluations.
Abstract
Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior distribution sampled from a meta-prior by interacting with bandit instances drawn from it. However, its analysis was limited to Gaussian bandit. The contextual multi-armed bandit framework is an extension of the Gaussian Bandit, which challenges agent to utilize context vectors to predict the most valuable arms, optimally balancing exploration and exploitation to minimize regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an bound on its Bayes regret, in which represents the number of bandit instances, and …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Speech and Audio Processing
