A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training
Sami Ben Ali (TARAN), Silviu-Ioan Filip (TARAN), Olivier Sentieys, (TARAN)

TL;DR
This paper introduces a low-precision floating-point MAC unit with stochastic rounding and reduced accumulator precision, achieving hardware efficiency while maintaining DNN training accuracy across various vision tasks.
Contribution
It proposes a novel FP8-based MAC with FP12 accumulations and stochastic rounding, reducing hardware complexity and power consumption without sacrificing model accuracy.
Findings
Significant reduction in MAC area and power consumption.
Maintains near baseline accuracy across multiple vision tasks.
Optimized stochastic rounding implementation improves hardware efficiency.
Abstract
Training Deep Neural Networks (DNNs) can be computationally demanding, particularly when dealing with large models. Recent work has aimed to mitigate this computational challenge by introducing 8-bit floating-point (FP8) formats for multiplication. However, accumulations are still done in either half (16-bit) or single (32-bit) precision arithmetic. In this paper, we investigate lowering accumulator word length while maintaining the same model accuracy. We present a multiply-accumulate (MAC) unit with FP8 multiplier inputs and FP12 accumulations, which leverages an optimized stochastic rounding (SR) implementation to mitigate swamping errors that commonly arise during low precision accumulations. We investigate the hardware implications and accuracy impact associated with varying the number of random bits used for rounding operations. We additionally attempt to reduce MAC area and power…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
