Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

Pierre Boudart (SIERRA); Pierre Gaillard (Thoth); Alessandro Rudi (PSL; DI-ENS; Inria)

arXiv:2605.19768·cs.AI·May 20, 2026

Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

Pierre Boudart (SIERRA), Pierre Gaillard (Thoth), Alessandro Rudi (PSL, DI-ENS, Inria)

PDF

TL;DR

This paper introduces a new regret bound for reinforcement learning in multinomial logistic MDPs that adapts to problem structure, achieving minimax optimality and improving previous bounds for certain cases.

Contribution

The authors propose a variance-aware algorithm with regret bounds that adapt to the normalized variance of the value function, improving upon existing bounds and establishing minimax optimality.

Findings

01

The new regret bound depends on a problem-dependent variance measure.

02

For KL-constrained robust MDPs, the bound reduces horizon dependence by a factor of H.

03

The paper proves a matching lower bound, establishing minimax optimality.

Abstract

We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms for MNL mixture MDPs yield a regret of $\tilde{O} (d H^{2} T)$ (Li et al., 2024), where $d$ is the feature dimension, $H$ the episode length, and $T$ the number of episodes. Inspired by the logistic bandit literature (Abeille et al., 2021; Faury et al., 2022; Boudart et al., 2026), we introduce a problem-dependent constant $\overset{σ}{ˉ}_T \leq 1/2$ , measuring the normalised average variance of the optimal downstream value function along the learner's trajectory. We propose an algorithm achieving a regret of $\tilde{O} (d H^{2} \overset{σ}{ˉ}_T T)$ , which recovers the existing bound in the worst case and improves upon it for structured MDPs. For instance, for KL-constrained robust MDPs, $\bar\sigma\_T =…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.