Thompson Sampling for the MNL-Bandit

Shipra Agrawal; Vashist Avadhanula; Vineet Goyal; Assaf Zeevi

arXiv:1706.00977·cs.LG·January 7, 2019·24 cites

Thompson Sampling for the MNL-Bandit

Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

PDF

Open Access

TL;DR

This paper introduces a Thompson Sampling approach for the MNL-Bandit problem, a sequential subset selection task with unknown parameters, achieving near-optimal regret and strong numerical results.

Contribution

It adapts Thompson Sampling to the MNL-Bandit problem, providing a near-optimal regret guarantee and demonstrating effective numerical performance.

Findings

01

Achieves near-optimal regret bounds.

02

Demonstrates strong numerical performance.

03

Addresses a broad class of exploration-exploitation problems.

Abstract

We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none. Each item in the index set is ascribed a certain value (reward), and the feedback is governed by a Multinomial Logit (MNL) choice model whose parameters are a priori unknown. The objective of the decision maker is to maximize the expected cumulative rewards over a finite horizon $T$ , or alternatively, minimize the regret relative to an oracle that knows the MNL parameters. We refer to this as the MNL-Bandit problem. This problem is representative of a larger family of exploration-exploitation problems that involve a combinatorial objective, and arise in several important application domains. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems