Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
Yu-Jie Zhang, Sheng-An Xu, Peng Zhao, Masashi Sugiyama

TL;DR
This paper introduces a new algorithm for generalized linear bandits that achieves near-optimal regret with constant-time updates per round, combining statistical and computational efficiency.
Contribution
The paper presents a jointly efficient algorithm for GLBs that attains near-optimal regret with one-pass updates, leveraging a novel confidence set analysis for OMD estimators.
Findings
Achieves near-optimal regret bounds for GLBs.
Operates with constant per-round time and space complexity.
Provides a tight confidence set for the OMD estimator.
Abstract
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Smart Grid Energy Management
