Improved Online Confidence Bounds for Multinomial Logistic Bandits

Joongkyu Lee; Min-hwan Oh

arXiv:2502.10020·stat.ML·June 17, 2025

Improved Online Confidence Bounds for Multinomial Logistic Bandits

Joongkyu Lee, Min-hwan Oh

PDF

Open Access 1 Video

TL;DR

This paper introduces an improved online confidence bound for multinomial logistic models, leading to more efficient algorithms with variance-dependent regret bounds in MNL bandit problems.

Contribution

It derives a tighter online confidence bound for MNL models and proposes two algorithms with improved regret guarantees, reducing dependence on unknown parameters.

Findings

01

Achieved a variance-dependent regret bound of $O(d \, \log T \sqrt{\sum_{t=1}^T \sigma_t^2})$

02

Developed the OFU-MNL++ algorithm with constant-time complexity

03

Introduced the OFU-MN$^2$L algorithm with poly(B)-free regret bounds.

Abstract

In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly minimax-optimal regret in MNL bandits. However, their results still depend on the norm-boundedness of the unknown parameter $B$ and the maximum size of possible outcomes $K$ . To address this, we first derive an online confidence bound of $O (d lo g t + B d)$ , which is a significant improvement over the previous bound of $O (B d lo g t lo g K)$ (Lee & Oh, 2024). This is mainly achieved by establishing tighter self-concordant properties of the MNL loss and applying Ville's inequality to bound the estimation error. Using this new online confidence bound, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Online Confidence Bounds for Multinomial Logistic Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems