Lossless Compression of Neural Network Components: Weights, Checkpoints, and K/V Caches in Low-Precision Formats

Anat Heilper; Doron Singer

arXiv:2508.19263·cs.LG·August 28, 2025

Lossless Compression of Neural Network Components: Weights, Checkpoints, and K/V Caches in Low-Precision Formats

Anat Heilper, Doron Singer

PDF

TL;DR

This paper extends lossless compression techniques to low-precision neural network formats like FP8 and FP4, achieving significant size reductions and enabling efficient deployment of large models.

Contribution

It introduces a novel compression method for low-precision formats and demonstrates its effectiveness on model weights and K/V caches in LLMs.

Findings

01

Compression ratios up to 83% for FP8

02

Effective compression of K/V cache tensors

03

Extension of ZipNN to lower-precision formats

Abstract

As deep learning models grow and deployment becomes more widespread, reducing the storage and transmission costs of neural network weights has become increasingly important. While prior work such as ZipNN has shown that lossless compression methods - particularly those based on Huffman encoding floating-point exponents can significantly reduce model sizes, these techniques have primarily been applied to higher-precision formats such as FP32 and BF16. In this work, we extend the ZipNN approach to lower-precision floating-point formats, specifically FP8 and FP4, which are gaining popularity for efficient inference. We design a compression method that separates and compresses the exponent and mantissa components independently using entropy coding. Our evaluation shows compression ratios up to 62% for BF16 and 83% for FP8. We also investigate the compressibility of key-value (K/V) cache…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.