Improved Online Confidence Bounds for Multinomial Logistic Bandits
Joongkyu Lee, Min-hwan Oh

TL;DR
This paper introduces an improved online confidence bound for multinomial logistic models, leading to more efficient algorithms with variance-dependent regret bounds in MNL bandit problems.
Contribution
It derives a tighter online confidence bound for MNL models and proposes two algorithms with improved regret guarantees, reducing dependence on unknown parameters.
Findings
Achieved a variance-dependent regret bound of $O(d \, \log T \sqrt{\sum_{t=1}^T \sigma_t^2})$
Developed the OFU-MNL++ algorithm with constant-time complexity
Introduced the OFU-MN$^2$L algorithm with poly(B)-free regret bounds.
Abstract
In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online confidence bound for MNL models and achieved nearly minimax-optimal regret in MNL bandits. However, their results still depend on the norm-boundedness of the unknown parameter and the maximum size of possible outcomes . To address this, we first derive an online confidence bound of , which is a significant improvement over the previous bound of (Lee & Oh, 2024). This is mainly achieved by establishing tighter self-concordant properties of the MNL loss and applying Ville's inequality to bound the estimation error. Using this new online confidence bound, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems
