Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms

Pravin Nair

arXiv:2510.23012·cs.LG·October 28, 2025

Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms

Pravin Nair

PDF

TL;DR

This paper proves that the softmax function has a Lipschitz constant of 1/2 across all p-norms, providing tighter robustness and convergence guarantees and validating this bound empirically on neural architectures and reinforcement learning policies.

Contribution

It establishes a uniform Lipschitz constant of 1/2 for softmax across all p-norms, a novel comprehensive analysis that improves theoretical guarantees.

Findings

01

Softmax has a Lipschitz constant of 1/2 across all p-norms.

02

The local Lipschitz constant attains 1/2 for p=1 and p=8ff8ff.

03

Empirical validation on neural models confirms the sharpness of the bound.

Abstract

The softmax function is a basic operator in machine learning and optimization, used in classification, attention mechanisms, reinforcement learning, game theory, and problems involving log-sum-exp terms. Existing robustness guarantees of learning models and convergence analysis of optimization algorithms typically consider the softmax operator to have a Lipschitz constant of $1$ with respect to the $ℓ_{2}$ norm. In this work, we prove that the softmax function is contractive with the Lipschitz constant $1/2$ , uniformly across all $ℓ_{p}$ norms with $p \geq 1$ . We also show that the local Lipschitz constant of softmax attains $1/2$ for $p = 1$ and $p = \infty$ , and for $p \in (1, \infty)$ , the constant remains strictly below $1/2$ and the supremum $1/2$ is achieved only in the limit. To our knowledge, this is the first comprehensive norm-uniform analysis of softmax Lipschitz continuity.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.