BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu,, Ningyi Xu

TL;DR
BitDistiller is a novel framework combining quantization-aware training and self-distillation to enhance ultra-low precision (sub-4-bit) large language models, achieving superior performance with reduced resource requirements.
Contribution
It introduces a tailored asymmetric quantization, clipping technique, and a confidence-aware distillation objective for improved ultra-low precision LLMs.
Findings
Outperforms existing methods in 3-bit and 2-bit settings
Achieves better general language understanding and reasoning
Requires fewer data and training resources
Abstract
The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clipping technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance. Empirical evaluations demonstrate that BitDistiller significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Heisenger/TinyLlama_v1.1_2bit_int_three_times_datamodel· 3 dl3 dl
- 🤗BrownianNotion/TinyLlama_v1.1_2bit_int_3x_data_3_epochsmodel· 1 dl1 dl
- 🤗BrownianNotion/Llama-2-7b-hf_2bit_intmodel
- 🤗Heisenger/Llama-2-7b-hf_1bit_intmodel
- 🤗fredericowieser/TinyLlama_v1.1_3bit_int_3_bit_intmodel· 1 dl1 dl
- 🤗fredericowieser/TinyLlama_v1.1_3bit_nf3model· 2 dl2 dl
- 🤗VictorFiz/TinyLlama_v1.1_2bit_int_llama2_7b_teachermodel· 1 dl1 dl
- 🤗Niks898/TinyLlama_v1.1_2bit_int_ce_plus_cakld_20model· 3 dl3 dl
- 🤗Niks898/2-bit-ce_plus_cakldmodel
- 🤗acoleman/2-bit-baselinemodel
Videos
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · Advanced Memory and Neural Computing
MethodsKnowledge Distillation
