Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
Joongkyu Lee, Min-hwan Oh

TL;DR
This paper establishes nearly minimax optimal regret bounds for the multinomial logit bandit problem, introducing an efficient algorithm that matches these bounds under both uniform and non-uniform rewards.
Contribution
It provides the first proof of minimax optimality in the contextual MNL bandit setting and proposes a computationally efficient algorithm achieving these bounds.
Findings
Achieves regret bounds of d ext{T/K} for uniform rewards.
Achieves regret bounds of d ext{T} for non-uniform rewards.
Introduces OFU-MNL+ algorithm with theoretical guarantees.
Abstract
In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size . Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of . We also provide instance-dependent minimax regret bounds under uniform rewards. Under non-uniform rewards, we prove a lower bound of and an upper bound of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and ELM · Smart Grid Energy Management
