Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms
Pravin Nair

TL;DR
This paper proves that the softmax function has a Lipschitz constant of 1/2 across all p-norms, providing tighter robustness and convergence guarantees and validating this bound empirically on neural architectures and reinforcement learning policies.
Contribution
It establishes a uniform Lipschitz constant of 1/2 for softmax across all p-norms, a novel comprehensive analysis that improves theoretical guarantees.
Findings
Softmax has a Lipschitz constant of 1/2 across all p-norms.
The local Lipschitz constant attains 1/2 for p=1 and p=8ff8ff.
Empirical validation on neural models confirms the sharpness of the bound.
Abstract
The softmax function is a basic operator in machine learning and optimization, used in classification, attention mechanisms, reinforcement learning, game theory, and problems involving log-sum-exp terms. Existing robustness guarantees of learning models and convergence analysis of optimization algorithms typically consider the softmax operator to have a Lipschitz constant of with respect to the norm. In this work, we prove that the softmax function is contractive with the Lipschitz constant , uniformly across all norms with . We also show that the local Lipschitz constant of softmax attains for and , and for , the constant remains strictly below and the supremum is achieved only in the limit. To our knowledge, this is the first comprehensive norm-uniform analysis of softmax Lipschitz continuity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
