Online normalizer calculation for softmax

Maxim Milakov (NVIDIA); Natalia Gimelshein (NVIDIA)

arXiv:1805.02867·cs.PF·July 31, 2018·20 cites

Online normalizer calculation for softmax

Maxim Milakov (NVIDIA), Natalia Gimelshein (NVIDIA)

PDF

Open Access 1 Repo

TL;DR

This paper introduces a memory-efficient method for computing the Softmax function, leading to significant speedups on hardware by reducing memory accesses, especially when combined with TopK operations.

Contribution

It proposes a novel approach to compute Softmax with fewer memory accesses, improving performance on hardware, and demonstrates substantial acceleration in benchmarks.

Findings

01

Softmax accelerates by up to 1.3x on hardware.

02

Combined Softmax+TopK accelerates by up to 5x.

03

Memory access reduction improves Softmax performance.

Abstract

The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory accesses and hypothesize that this reduction in memory accesses should improve Softmax performance on actual hardware. The benchmarks confirm this hypothesis: Softmax accelerates by up to 1.3x and Softmax+TopK combined and fused by up to 5x.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NVIDIA/online-softmax
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Numerical Methods and Algorithms

MethodsSoftmax