Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang

TL;DR
This paper introduces two novel algorithms for generalized linear bandits with heavy-tailed rewards, achieving near-optimal regret bounds and practical online learning capabilities, addressing limitations of existing methods for unbounded reward scenarios.
Contribution
The paper proposes truncation and mean-of-medians algorithms for heavy-tailed rewards, with improved regret bounds and practical online learning support.
Findings
Achieve regret bound of O(dT^{1/(1+psilon)})
Support online learning with truncation-based algorithm
Require only O(log T) rewards for mean-of-medians algorithm
Abstract
This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose -th moment is bounded for some . Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of , where is the dimension of contextual information and is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only rewards and one estimator per epoch,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization
MethodsFocus
