Softermax: Hardware/Software Co-Design of an Efficient Softmax for   Transformers

Jacob R. Stevens; Rangharajan Venkatesan; Steve Dai; Brucek Khailany,; Anand Raghunathan

arXiv:2103.09301·cs.AR·March 18, 2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Jacob R. Stevens, Rangharajan Venkatesan, Steve Dai, Brucek Khailany,, Anand Raghunathan

PDF

TL;DR

Softermax is a hardware/software co-designed softmax function that significantly improves energy efficiency and reduces size in Transformer models with minimal accuracy loss.

Contribution

This paper introduces Softermax, a novel hardware-friendly softmax implementation optimized for Transformers, combining base replacement, low-precision computation, and online normalization.

Findings

01

2.35x energy efficiency improvement

02

0.90x baseline size with negligible accuracy impact

03

Effective hardware/software co-design for softmax in Transformers

Abstract

Transformers have transformed the field of natural language processing. This performance is largely attributed to the use of stacked self-attention layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation. We show Softermax results in 2.35x the energy efficiency at 0.90x the size of a comparable baseline, with negligible impact on network accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.