Multinomial Logit Contextual Bandits: Provable Optimality and   Practicality

Min-hwan Oh; Garud Iyengar

arXiv:2103.13929·stat.ML·March 26, 2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Min-hwan Oh, Garud Iyengar

PDF

Open Access 1 Video

TL;DR

This paper develops and analyzes algorithms for a sequential assortment selection problem modeled by a multinomial logit (MNL) choice, achieving near-optimal regret bounds and introducing new confidence bounds for MNL parameter estimation.

Contribution

It introduces two UCB-based algorithms for MNL contextual bandits, with the second achieving near-optimal regret matching the lower bound, and presents a novel non-asymptotic confidence bound for MNL MLE.

Findings

01

First algorithm achieves $ ilde{O}(d\,\sqrt{T})$ regret.

02

Second algorithm achieves $ ilde{O}(\sqrt{dT})$ regret, matching the lower bound.

03

A new confidence bound for MNL MLE is established.

Abstract

We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. In each period, the learning agent observes a $d$ -dimensional contextual information about the user and the $N$ available items, and offers an assortment of size $K$ to the user, and observes the bandit feedback of the item chosen from the assortment. We propose upper confidence bound based algorithms for this MNL contextual bandit. The first algorithm is a simple and practical method which achieves an $\tilde{O} (d T)$ regret over $T$ rounds. Next, we propose a second algorithm which achieves a $\tilde{O} (d T)$ regret. This matches the lower bound for the MNL bandit problem, up to logarithmic terms, and improves on the best known result by a $d$ factor. To establish this sharper regret bound,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems