Efficient softmax approximation for GPUs
Edouard Grave, Armand Joulin, Moustapha Ciss\'e, David Grangier,, Herv\'e J\'egou

TL;DR
This paper introduces adaptive softmax, an efficient approximation method for training neural language models with large vocabularies on GPUs, significantly reducing computation time while maintaining accuracy.
Contribution
The paper presents adaptive softmax, a novel approach that exploits word frequency distribution and GPU architecture to efficiently approximate softmax in large vocabulary models.
Findings
Significant speedup over standard softmax methods
Maintains high accuracy close to full softmax
Effective on large-scale benchmarks like EuroParl and One Billion Word
Abstract
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
MethodsAdaptive Softmax
