Fully Gap-Dependent Bounds for Multinomial Logit Bandit

Jiaqi Yang

arXiv:2011.09998·cs.LG·November 20, 2020·1 cites

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

Jiaqi Yang

PDF

Open Access

TL;DR

This paper introduces new algorithms for the multinomial logit bandit problem that achieve fully gap-dependent bounds, improving the understanding of how suboptimality gaps influence learning efficiency and regret.

Contribution

It presents the first algorithms with gap-dependent bounds for MNL bandits, relating the problem to top-K arm identification and introducing novel epoch-based and layer-based estimation techniques.

Findings

01

Algorithms identify optimal assortments efficiently with high probability.

02

Regret bounds depend explicitly on suboptimality gaps of items.

03

First to achieve fully gap-dependent bounds in MNL bandit setting.

Abstract

We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S^{*}$ within $O (\sum_{i = 1}^{N} Δ_{i}^{- 2})$ time steps with high probability, and (ii) an algorithm that incurs $O (\sum_{i \in / S^{*}} K Δ_{i}^{- 1} lo g T)$ regret in $T$ time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top- $K$ arm identification problem in multi-armed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems