Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive   Microexponents, and Code Recycling for Direct-Cast Compression of Large   Language Models

Yun-Chen Lo; Gu-Yeon Wei; David Brooks

arXiv:2412.19821·cs.AR·December 31, 2024

Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive Microexponents, and Code Recycling for Direct-Cast Compression of Large Language Models

Yun-Chen Lo, Gu-Yeon Wei, David Brooks

PDF

Open Access

TL;DR

This paper introduces NxFP, a novel low-bit floating-point format for large language models that improves accuracy and reduces memory footprint compared to existing Microscaling standards.

Contribution

NxFP proposes NanoMantissa, Adaptive Microexponent, and Code Recycling techniques to enhance low-bit floating-point representation for LLMs, addressing key challenges in Microscaling.

Findings

01

Outperforms MxFP by up to 0.64 perplexity points.

02

Achieves up to 30% accuracy improvement on MMLU benchmarks.

03

Reduces memory footprint by up to 16%.

Abstract

As cutting-edge large language models (LLMs) continue to transform various industries, their fast-growing model size and sequence length have led to memory traffic and capacity challenges. Recently, AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm have proposed a Microscaling standard (Mx), which augments block floating-point with microexponents to achieve promising perplexity-to-footprint trade-offs. However, the Microscaling suffers from significant perplexity degradation on modern LLMs with less than six bits. This paper profiles modern LLMs and identifies three main challenges of low-bit Microscaling format, i.e., inaccurate tracking of outliers, vacant quantization levels, and wasted binary code. In response, Nanoscaling (NxFP) proposes three techniques, i.e., NanoMantissa, Adaptive Microexponent, and Code Recycling to enable better accuracy and smaller memory footprint than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques