Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant

Qi Lv; Lei Geng; Ziqiang Cao; Min Cao; Sujian Li; Wenjie Li; and Guohong Fu

arXiv:2508.03175·cs.LG·August 6, 2025

Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant

Qi Lv, Lei Geng, Ziqiang Cao, Min Cao, Sujian Li, Wenjie Li, and Guohong Fu

PDF

TL;DR

The paper introduces Adaptive Sparse Softmax (AS-Softmax), a novel softmax variant that improves training efficiency and classification accuracy by focusing on relevant classes and adaptively accelerating learning across various modalities.

Contribution

It proposes a new softmax transformation that discards irrelevant classes during training and an adaptive gradient strategy, enhancing efficiency and performance across multiple tasks.

Findings

01

AS-Softmax outperforms standard softmax and variants in diverse classification tasks.

02

Training speed is increased by approximately 20% with the adaptive gradient strategy.

03

Loss correlates strongly with validation accuracy, indicating effective learning.

Abstract

Softmax with the cross entropy loss is the standard configuration for current neural classification models. The gold score for a target class is supposed to be 1, but it is never reachable under the softmax schema. Such a problem makes the training process continue forever and leads to overfitting. Moreover, the "target-approach-1" training goal forces the model to continuously learn all samples, leading to a waste of time in handling some samples which have already been classified correctly with high confidence, while the test goal simply requires the target class of each sample to hold the maximum score. To solve the above weaknesses, we propose the Adaptive Sparse softmax (AS-Softmax) which designs a reasonable and test-matching transformation on top of softmax. For more purposeful learning, we discard the classes with far smaller scores compared with the actual class during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.