BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via   Self-Distillation

Dayou Du; Yijia Zhang; Shijie Cao; Jiaqi Guo; Ting Cao; Xiaowen Chu,; Ningyi Xu

arXiv:2402.10631·cs.CL·February 19, 2024·1 cites

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu,, Ningyi Xu

PDF

Open Access 2 Repos 10 Models 1 Video

TL;DR

BitDistiller is a novel framework combining quantization-aware training and self-distillation to enhance ultra-low precision (sub-4-bit) large language models, achieving superior performance with reduced resource requirements.

Contribution

It introduces a tailored asymmetric quantization, clipping technique, and a confidence-aware distillation objective for improved ultra-low precision LLMs.

Findings

01

Outperforms existing methods in 3-bit and 2-bit settings

02

Achieves better general language understanding and reasoning

03

Requires fewer data and training resources

Abstract

The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clipping technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance. Empirical evaluations demonstrate that BitDistiller significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation· underline

Taxonomy

TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · Advanced Memory and Neural Computing

MethodsKnowledge Distillation