Online normalizer calculation for softmax
Maxim Milakov (NVIDIA), Natalia Gimelshein (NVIDIA)

TL;DR
This paper introduces a memory-efficient method for computing the Softmax function, leading to significant speedups on hardware by reducing memory accesses, especially when combined with TopK operations.
Contribution
It proposes a novel approach to compute Softmax with fewer memory accesses, improving performance on hardware, and demonstrates substantial acceleration in benchmarks.
Findings
Softmax accelerates by up to 1.3x on hardware.
Combined Softmax+TopK accelerates by up to 5x.
Memory access reduction improves Softmax performance.
Abstract
The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory accesses and hypothesize that this reduction in memory accesses should improve Softmax performance on actual hardware. The benchmarks confirm this hypothesis: Softmax accelerates by up to 1.3x and Softmax+TopK combined and fused by up to 5x.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Numerical Methods and Algorithms
MethodsSoftmax
