Single-Stage Huffman Encoder for ML Compression
Aditya Agrawal, Albert Magyar, Hiteshwar Eswaraiah, Patrick Sheridan, Pradeep Janedula, Ravi Krishnan Venkatesan, Krishna Nair, Ravi Iyer

TL;DR
This paper introduces a single-stage Huffman encoder that uses fixed codebooks based on previous data, significantly reducing overhead and enabling efficient real-time compression for large language models.
Contribution
It proposes a novel single-stage Huffman encoding method with fixed codebooks derived from prior data, eliminating traditional overheads for low-latency ML compression.
Findings
Achieves compression within 0.5% of per-shard Huffman coding
Approaches 1% of ideal Shannon compressibility
Demonstrates high statistical similarity of tensors across layers
Abstract
Training and serving Large Language Models (LLMs) require partitioning data across multiple accelerators, where collective operations are frequently bottlenecked by network bandwidth. Lossless compression using Huffman codes is an effective way to alleviate the issue, however, its three-stage design requiring on-the-fly frequency analysis, codebook generation and transmission of codebook along with data introduces computational, latency and data overheads which are prohibitive for latency-sensitive scenarios such as die-to-die communication. This paper proposes a single-stage Huffman encoder that eliminates these overheads by using fixed codebooks derived from the average probability distribution of previous data batches. Through our analysis of the Gemma 2B model, we demonstrate that tensors exhibit high statistical similarity across layers and shards. Using this approach we achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Speech Recognition and Synthesis · Advanced Data Compression Techniques
