UCB-based Algorithms for Multinomial Logistic Regression Bandits
Sanae Amani, Christos Thrampoulidis

TL;DR
This paper extends logistic bandit algorithms to multinomial outcomes, proposing MNL-UCB, which effectively maximizes revenue with theoretical regret bounds and practical performance in multi-outcome scenarios.
Contribution
It introduces MNL-UCB, a novel UCB-based algorithm for multinomial logistic bandits, providing the first regret guarantees for this setting.
Findings
MNL-UCB achieves regret of (dKT) in theory.
Numerical simulations confirm the effectiveness of MNL-UCB.
The approach handles multiple outcomes beyond binary rewards.
Abstract
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action , the user selects one of outcomes, say outcome , with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
