MoR: Mixture Of Representations For Mixed-Precision Training

Bor-Yiing Su; Peter Dykas; Mike Chrzanowski; Jatin Chhugani

arXiv:2512.22804·cs.LG·December 30, 2025

MoR: Mixture Of Representations For Mixed-Precision Training

Bor-Yiing Su, Peter Dykas, Mike Chrzanowski, Jatin Chhugani

PDF

Open Access

TL;DR

This paper introduces MoR, a dynamic quantization framework for mixed-precision training that adaptively selects representations at tensor and sub-tensor levels, achieving high accuracy and robustness.

Contribution

The paper proposes a novel, property-aware quantization framework called MoR that dynamically chooses between FP8 and BF16 representations at multiple granularities.

Findings

01

Achieves 98.38% of tensors quantized to FP8.

02

Maintains model quality across various quantization strategies.

03

Potential to improve robustness of low precision training.

Abstract

Mixed-precision training is a crucial technique for scaling deep learning models, but successful mixedprecision training requires identifying and applying the right combination of training methods. This paper presents our preliminary study on Mixture-of-Representations (MoR), a novel, per-tensor and sub-tensor level quantization framework that dynamically analyzes a tensor's numerical properties to select between a variety of different representations. Based on the framework, we have proposed and experimented concrete algorithms that choose dynamically between FP8 and BF16 representations for both per-tensor and sub-tensor level granularities. Our universal approach is designed to preserve model quality across various quantization partition strategies and datasets. Our initial findings show that this approach can achieve state-of-the-art results with 98.38% of tensors quantized to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications