Near-Optimal MNL Bandits Under Risk Criteria

Guangyu Xi; Chao Tao; Yuan Zhou

arXiv:2009.12511·cs.LG·March 17, 2021

Near-Optimal MNL Bandits Under Risk Criteria

Guangyu Xi, Chao Tao, Yuan Zhou

PDF

Open Access 1 Video

TL;DR

This paper develops algorithms for MNL bandits optimized for various risk criteria, achieving near-optimal regret and demonstrating strong empirical performance on synthetic and real data.

Contribution

It introduces a unified approach to handle multiple risk criteria in MNL bandits, extending beyond expected revenue optimization.

Findings

01

Algorithms achieve near-optimal regret bounds.

02

Empirical results show strong performance on synthetic data.

03

Algorithms perform well on real-world datasets.

Abstract

We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal MNL Bandits Under Risk Criteria· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications