Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm

Pierre Boudart (SIERRA); Pierre Gaillard (Thoth); Alessandro Rudi (PSL; DI-ENS; Inria)

arXiv:2507.05306·stat.ML·February 25, 2026

Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm

Pierre Boudart (SIERRA), Pierre Gaillard (Thoth), Alessandro Rudi (PSL, DI-ENS, Inria)

PDF

Open Access

TL;DR

This paper introduces a minimax-optimal algorithm for multinomial logistic bandits that leverages non-linearity to achieve improved regret bounds over existing methods, applicable to complex multi-outcome decision problems.

Contribution

It extends the analysis of non-linearity effects from binary to multinomial logistic bandits and proposes an efficient, minimax-optimal algorithm with problem-dependent regret guarantees.

Findings

01

Achieves regret bound of $ ilde{O}(Rdrac{ oot{K}T}{ oot{ abla ext{sigmoid}}})$

02

Provides a matching lower bound, confirming minimax optimality

03

Extends non-linearity analysis to multi-outcome settings

Abstract

We consider the multinomial logistic bandit problem in which a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille et al., 2021). They introduced a problem-dependent constant $κ_{*} \geq 1$ that may be exponentially large in some problem parameters and which is captured by the derivative of the sigmoid function. It encapsulates the non-linearity and improves existing regret guarantees over $T$ rounds from $O (d T)$ to $O (d T / κ_{*})$ , where $d$ is the dimension of the parameter space. We extend their analysis to the multinomial logistic bandit framework with a finite action space, making it suitable for complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management