MoR: Mixture Of Representations For Mixed-Precision Training
Bor-Yiing Su, Peter Dykas, Mike Chrzanowski, Jatin Chhugani

TL;DR
This paper introduces MoR, a dynamic quantization framework for mixed-precision training that adaptively selects representations at tensor and sub-tensor levels, achieving high accuracy and robustness.
Contribution
The paper proposes a novel, property-aware quantization framework called MoR that dynamically chooses between FP8 and BF16 representations at multiple granularities.
Findings
Achieves 98.38% of tensors quantized to FP8.
Maintains model quality across various quantization strategies.
Potential to improve robustness of low precision training.
Abstract
Mixed-precision training is a crucial technique for scaling deep learning models, but successful mixedprecision training requires identifying and applying the right combination of training methods. This paper presents our preliminary study on Mixture-of-Representations (MoR), a novel, per-tensor and sub-tensor level quantization framework that dynamically analyzes a tensor's numerical properties to select between a variety of different representations. Based on the framework, we have proposed and experimented concrete algorithms that choose dynamically between FP8 and BF16 representations for both per-tensor and sub-tensor level granularities. Our universal approach is designed to preserve model quality across various quantization partition strategies and datasets. Our initial findings show that this approach can achieve state-of-the-art results with 98.38% of tensors quantized to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
