Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback
Qirun Zeng, Xuchuang Wang, Jiayi Shen, Xutong Liu, Fang Kong, Jinhang Zuo

TL;DR
This paper introduces a hybrid feedback approach for best arm identification in generalized linear bandits, combining absolute and relative feedback to improve sample efficiency with theoretical guarantees.
Contribution
It develops a likelihood-ratio-based confidence sequence and a hybrid Track-and-Stop algorithm that adaptively allocates queries, extending to cost-aware settings.
Findings
The proposed algorithms outperform baseline methods in sample efficiency.
The confidence set is explicit and unifies heterogeneous observations.
The framework includes theoretical guarantees of correctness and stopping time bounds.
Abstract
We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish -correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
