Floating-Point Multiply-Add with Approximate Normalization for Low-Cost   Matrix Engines

Kosmas Alexandridis; Christodoulos Peltekis; Dionysios Filippas,; Giorgos Dimitrakopoulos

arXiv:2408.11997·cs.AR·August 23, 2024

Floating-Point Multiply-Add with Approximate Normalization for Low-Cost Matrix Engines

Kosmas Alexandridis, Christodoulos Peltekis, Dionysios Filippas,, Giorgos Dimitrakopoulos

PDF

Open Access

TL;DR

This paper introduces an approximate normalization technique for floating-point multiply-add units in matrix engines, significantly reducing hardware complexity and power consumption while maintaining acceptable accuracy in machine learning models.

Contribution

It presents a novel approximate normalization method that decreases hardware area and power usage in floating-point units for machine learning accelerators without degrading model accuracy.

Findings

01

16% reduction in area and power consumption for Bfloat16 units

02

1% average accuracy loss in transformer models

03

Effective energy efficiency improvement in matrix engines

Abstract

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point arithmetic. However, conventional floating-point implementations can be power hungry. This paper proposes a method to improve the energy efficiency of the matrix engines used in machine learning algorithm acceleration. Our approach leverages approximate normalization within the floating-point multiply-add units as a means to reduce their hardware complexity, without sacrificing overall machine-learning model accuracy. Hardware synthesis results show that this technique reduces area and power consumption roughly by 16% and 13% on average for Bfloat16 format. Also, the error introduced in transformer model accuracy is 1% on average, for the most efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReal-time simulation and control systems