With Shared Microexponents, A Little Shifting Goes a Long Way

Bita Rouhani; Ritchie Zhao; Venmugil Elango; Rasoul Shafipour; Mathew; Hall; Maral Mesmakhosroshahi; Ankit More; Levi Melnick; Maximilian Golub,; Girish Varatkar; Lei Shao; Gaurav Kolhe; Dimitry Melts; Jasmine Klar; Renee; L'Heureux; Matt Perry; Doug Burger; Eric Chung; Zhaoxia Deng; Sam Naghshineh,; Jongsoo Park; Maxim Naumov

arXiv:2302.08007·cs.LG·April 14, 2023

With Shared Microexponents, A Little Shifting Goes a Long Way

Bita Rouhani, Ritchie Zhao, Venmugil Elango, Rasoul Shafipour, Mathew, Hall, Maral Mesmakhosroshahi, Ankit More, Levi Melnick, Maximilian Golub,, Girish Varatkar, Lei Shao, Gaurav Kolhe, Dimitry Melts, Jasmine Klar, Renee, L'Heureux, Matt Perry, Doug Burger, Eric Chung

PDF

Open Access 1 Repo

TL;DR

This paper presents Block Data Representations (BDR) and introduces shared microexponent (MX) formats that significantly improve quantization efficiency and performance in deep learning models, outperforming existing standards.

Contribution

It proposes a novel shared microexponent quantization format within the BDR framework, enabling better performance and flexibility for narrow-precision deep learning applications.

Findings

01

MX outperforms state-of-the-art quantization methods

02

Effective on large-scale generative models

03

Improves efficiency in recommendation systems

Abstract

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rocm/tensorcast
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Neural Networks and Reservoir Computing · Neural Networks and Applications