Ghosts of Softmax: Complex Singularities That Limit Safe Step Sizes in Cross-Entropy
Piyush Sao

TL;DR
This paper reveals that complex singularities in the softmax function impose fundamental limits on safe step sizes in cross-entropy optimization, and proposes a geometric method to improve training stability.
Contribution
The authors derive explicit formulas for the Taylor convergence radius in cross-entropy, linking complex analysis to optimization stability and proposing a practical normalization scheme.
Findings
Normalized step size $r<1$ ensures stable training across models.
Complex singularities in softmax limit the Taylor convergence radius.
A controller enforcing $ au extless ho_a$ improves robustness significantly.
Abstract
Optimization analyses for cross-entropy training rely on local Taylor models of the loss to predict whether a proposed step will decrease the objective. These surrogates are reliable only inside the Taylor convergence radius of the true loss along the update direction. That radius is set not by real-line curvature alone but by the nearest complex singularity. For cross-entropy, the softmax partition function has complex zeros -- ``ghosts of softmax'' -- that induce logarithmic singularities in the loss and cap this radius. To make this geometry usable, we derive closed-form expressions under logit linearization along the proposed update direction. In the binary case, the exact radius is . In the multiclass case, we obtain the lower bound , where is the spread of directional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural dynamics and brain function · Neural Networks and Reservoir Computing
