Fully Gap-Dependent Bounds for Multinomial Logit Bandit
Jiaqi Yang

TL;DR
This paper introduces new algorithms for the multinomial logit bandit problem that achieve fully gap-dependent bounds, improving the understanding of how suboptimality gaps influence learning efficiency and regret.
Contribution
It presents the first algorithms with gap-dependent bounds for MNL bandits, relating the problem to top-K arm identification and introducing novel epoch-based and layer-based estimation techniques.
Findings
Algorithms identify optimal assortments efficiently with high probability.
Regret bounds depend explicitly on suboptimality gaps of items.
First to achieve fully gap-dependent bounds in MNL bandit setting.
Abstract
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most from a pool of items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment within time steps with high probability, and (ii) an algorithm that incurs regret in time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top- arm identification problem in multi-armed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
