Discrete Choice Multi-Armed Bandits
Emerson Melo, David M\"uller

TL;DR
This paper links discrete choice models with multiarmed bandit algorithms, providing regret bounds and introducing a flexible, efficiently implementable family of adversarial bandit algorithms inspired by nested logit models.
Contribution
It establishes regret bounds for a broad class of algorithms and introduces a new family of adversarial bandit algorithms based on nested logit models.
Findings
Sublinear regret bounds for the algorithms.
Introduction of a flexible family of adversarial bandit algorithms.
Numerical experiments demonstrating practical implementation.
Abstract
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. Secondly, we introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models initially introduced by \citet{wen:2001}. These algorithms offer users the flexibility to fine-tune the model extensively, as they can be implemented efficiently due to their closed-form sampling distribution probabilities. To demonstrate the practical implementation of our algorithms, we present numerical experiments, focusing on the stochastic bandit case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Auction Theory and Applications
