LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation

Fangxin Liu; Ning Yang; Junping Zhao; Tao Yang; Haibing Guan; Li Jiang

arXiv:2506.12038·cs.LG·June 17, 2025

LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation

Fangxin Liu, Ning Yang, Junping Zhao, Tao Yang, Haibing Guan, Li Jiang

PDF

Open Access

TL;DR

This paper introduces LCD, a novel approach combining clustering-based quantization and knowledge distillation to effectively compress large language models to ultra-low bit widths, significantly reducing memory and computation costs.

Contribution

LCD unifies clustering-based quantization with knowledge distillation, enabling ultra-low bit compression of LLMs while maintaining performance and improving inference speed.

Findings

01

Outperforms existing low-bit quantization methods.

02

Achieves up to 6.2x inference speedup.

03

Maintains model accuracy at 2-3 bits.

Abstract

Large language models (LLMs) have achieved significant progress in natural language processing but face challenges in deployment due to high memory and computational requirements. Weight quantization is a common approach to address these issues, yet achieving effective low-bit compression remains challenging. This paper presents LCD, which unifies the learning of clustering-based quantization within a knowledge distillation framework. Using carefully designed optimization techniques, LCD preserves LLM performance even at ultra-low bit widths of 2-3 bits. Additionally, LCD compresses activations through smoothing and accelerates inference with a LUT-based design. Experimental results show that LCD outperforms existing methods and delivers up to a 6.2x speedup in inference. Notably, LCD is shown to be more cost-effective, making it a practical solution for real-world applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Topic Modeling · Neural Networks and Applications