With Shared Microexponents, A Little Shifting Goes a Long Way
Bita Rouhani, Ritchie Zhao, Venmugil Elango, Rasoul Shafipour, Mathew, Hall, Maral Mesmakhosroshahi, Ankit More, Levi Melnick, Maximilian Golub,, Girish Varatkar, Lei Shao, Gaurav Kolhe, Dimitry Melts, Jasmine Klar, Renee, L'Heureux, Matt Perry, Doug Burger, Eric Chung

TL;DR
This paper presents Block Data Representations (BDR) and introduces shared microexponent (MX) formats that significantly improve quantization efficiency and performance in deep learning models, outperforming existing standards.
Contribution
It proposes a novel shared microexponent quantization format within the BDR framework, enabling better performance and flexibility for narrow-precision deep learning applications.
Findings
MX outperforms state-of-the-art quantization methods
Effective on large-scale generative models
Improves efficiency in recommendation systems
Abstract
This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Neural Networks and Reservoir Computing · Neural Networks and Applications
