Ghosts of Softmax: Complex Singularities That Limit Safe Step Sizes in Cross-Entropy

Piyush Sao

arXiv:2603.13552·cs.LG·March 17, 2026

Ghosts of Softmax: Complex Singularities That Limit Safe Step Sizes in Cross-Entropy

Piyush Sao

PDF

Open Access

TL;DR

This paper reveals that complex singularities in the softmax function impose fundamental limits on safe step sizes in cross-entropy optimization, and proposes a geometric method to improve training stability.

Contribution

The authors derive explicit formulas for the Taylor convergence radius in cross-entropy, linking complex analysis to optimization stability and proposing a practical normalization scheme.

Findings

01

Normalized step size $r<1$ ensures stable training across models.

02

Complex singularities in softmax limit the Taylor convergence radius.

03

A controller enforcing $ au extless ho_a$ improves robustness significantly.

Abstract

Optimization analyses for cross-entropy training rely on local Taylor models of the loss to predict whether a proposed step will decrease the objective. These surrogates are reliable only inside the Taylor convergence radius of the true loss along the update direction. That radius is set not by real-line curvature alone but by the nearest complex singularity. For cross-entropy, the softmax partition function $F = \sum_{j} exp (z_{j})$ has complex zeros -- ``ghosts of softmax'' -- that induce logarithmic singularities in the loss and cap this radius. To make this geometry usable, we derive closed-form expressions under logit linearization along the proposed update direction. In the binary case, the exact radius is $ρ^{*} = δ^{2} + π^{2} / Δ_{a}$ . In the multiclass case, we obtain the lower bound $ρ_{a} = π / Δ_{a}$ , where $Δ_{a} = max_{k} a_{k} - min_{k} a_{k}$ is the spread of directional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural dynamics and brain function · Neural Networks and Reservoir Computing