eDKM: An Efficient and Accurate Train-time Weight Clustering for Large   Language Models

Minsik Cho; Keivan A. Vahid; Qichen Fu; Saurabh Adya; Carlo C Del; Mundo; Mohammad Rastegari; Devang Naik; Peter Zatloukal

arXiv:2309.00964·cs.LG·September 15, 2023

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del, Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal

PDF

Open Access

TL;DR

This paper introduces eDKM, a memory-efficient implementation of Differentiable KMeans Clustering, enabling effective train-time weight clustering for large language models, significantly reducing memory usage while maintaining high accuracy.

Contribution

It proposes a novel memory reduction technique for DKM, allowing large-scale LLM compression during training without excessive memory overhead.

Findings

01

Compressed LLaMA 7B from 12.6 GB to 2.5 GB using eDKM.

02

Reduced train-time memory footprint of a decoder layer by 130×.

03

Achieved competitive accuracy on LLM benchmarks.

Abstract

Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques