Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications
Jin Chen, Jin Zhang, Xu huang, Yi Yang, Defu Lian, Enhong Chen

TL;DR
This paper introduces the MIDX Sampler, an adaptive sampling method for softmax in large-scale classification that improves efficiency and accuracy through an inverted multi-index approach, backed by theoretical analysis and extensive experiments.
Contribution
The paper proposes the MIDX Sampler, a novel adaptive sampling strategy for softmax that reduces computational complexity and improves approximation accuracy using an inverted multi-index decomposition.
Findings
MIDX-Sampler achieves faster convergence than existing methods.
The method improves generalization in large-scale models.
Experimental results show superior efficiency and effectiveness.
Abstract
The softmax function is a cornerstone of multi-class classification, integral to a wide range of machine learning applications, from large-scale retrieval and ranking models to advanced large language models. However, its computational cost grows linearly with the number of classes, which becomes prohibitively expensive in scenarios with millions or even billions of classes. The sampled softmax, which relies on self-normalized importance sampling, has emerged as a powerful alternative, significantly reducing computational complexity. Yet, its estimator remains unbiased only when the sampling distribution matches the true softmax distribution. To improve both approximation accuracy and sampling efficiency, we propose the MIDX Sampler, a novel adaptive sampling strategy based on an inverted multi-index approach. Concretely, we decompose the softmax probability into several multinomial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSparse Evolutionary Training · Softmax
