LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs
Ofir Gordon, Lior Dikstein, Arnon Netzer, Idan Achituve, Hai Victor Habi

TL;DR
LATMiX introduces learnable affine transformations to improve microscaling quantization of large language models, reducing errors and boosting accuracy in low-bit quantization scenarios.
Contribution
The paper provides a theoretical analysis of MX quantization transformations and proposes LATMiX, a learnable affine transformation method that enhances quantization robustness.
Findings
LATMiX achieves consistent accuracy improvements over baselines.
Experiments demonstrate effectiveness across various model sizes.
Theoretical analysis guides the design of learnable transformations.
Abstract
Post-training quantization (PTQ) is a widely used approach for reducing the memory and compute costs of large language models (LLMs). Recent studies have shown that applying invertible transformations to activations can significantly improve quantization robustness by reducing activation outliers; however, existing approaches are largely restricted to rotation or Hadamard-based transformations. Moreover, most studies focused primarily on traditional quantization schemes, whereas modern hardware increasingly supports the microscaling (MX) data format. Attempts to combine both showed severe performance degradation, leading prior work to introduce assumptions on the transformations. In this work, we take a complementary perspective. First, we provide a theoretical analysis of transformations under MX quantization by deriving a bound on the quantization error. Our analysis emphasizes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
