eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del, Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal

TL;DR
This paper introduces eDKM, a memory-efficient implementation of Differentiable KMeans Clustering, enabling effective train-time weight clustering for large language models, significantly reducing memory usage while maintaining high accuracy.
Contribution
It proposes a novel memory reduction technique for DKM, allowing large-scale LLM compression during training without excessive memory overhead.
Findings
Compressed LLaMA 7B from 12.6 GB to 2.5 GB using eDKM.
Reduced train-time memory footprint of a decoder layer by 130×.
Achieved competitive accuracy on LLM benchmarks.
Abstract
Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
