DB-LLM: Accurate Dual-Binarization for Efficient LLMs

Hong Chen; Chengtao Lv; Liang Ding; Haotong Qin; Xiabin Zhou; Yifu; Ding; Xuebo Liu; Min Zhang; Jinyang Guo; Xianglong Liu; Dacheng Tao

arXiv:2402.11960·cs.LG·February 20, 2024·1 cites

DB-LLM: Accurate Dual-Binarization for Efficient LLMs

Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu, Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, Dacheng Tao

PDF

Open Access

TL;DR

This paper introduces DB-LLM, a dual-binarization approach that enhances the accuracy and efficiency of ultra-low-bit quantized large language models by combining flexible binarization and deviation-aware distillation.

Contribution

The paper proposes a novel dual-binarization method with flexible binarization and deviation-aware distillation to improve ultra-low-bit quantization of LLMs.

Findings

01

Significantly outperforms current state-of-the-art in ultra-low-bit quantization.

02

Reduces perplexity from 9.64 to 7.23 on benchmark tasks.

03

Achieves 20% reduction in computational consumption.

Abstract

Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective methods for improving the computational efficiency of LLMs. However, existing ultra-low-bit quantization always causes severe accuracy drops. In this paper, we empirically relieve the micro and macro characteristics of ultra-low bit quantization and present a novel Dual-Binarization method for LLMs, namely DB-LLM. For the micro-level, we take both the accuracy advantage of 2-bit-width and the efficiency advantage of binarization into account, introducing Flexible Dual Binarization (FDB). By splitting 2-bit quantized weights into two independent sets of binaries, FDB ensures the accuracy of representations and introduces flexibility, utilizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Rights Management and Security · Text and Document Classification Technologies

MethodsFocus