TernaryLLM: Ternarized Large Language Model

Tianqi Chen; Zhe Li; Weixiang Xu; Zeyu Zhu; Dong Li; Lu Tian; Emad; Barsoum; Peisong Wang; Jian Cheng

arXiv:2406.07177·cs.LG·June 12, 2024·1 cites

TernaryLLM: Ternarized Large Language Model

Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad, Barsoum, Peisong Wang, Jian Cheng

PDF

Open Access

TL;DR

TernaryLLM introduces a novel ternarization technique for large language models, reducing memory and computational costs while maintaining high performance through learnable scales, shifts, and a knowledge distillation method that preserves semantic information.

Contribution

The paper proposes Dual Learnable Ternarization and Outlier-Friendly Feature Knowledge Distillation, enabling effective extreme quantization of LLMs with improved accuracy and efficiency.

Findings

01

Outperforms previous low-bit quantization methods on text generation benchmarks.

02

Achieves 5.8 perplexity reduction on C4 dataset for LLaMA-3.

03

Improves zero-shot task accuracy by 8.2% over state-of-the-art.

Abstract

Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming from outliers in both weights and activations. In this work, observing asymmetric outliers and non-zero means in weights, we introduce Dual Learnable Ternarization (DLT), which enables both scales and shifts to be learnable. We also propose Outlier-Friendly Feature Knowledge Distillation (OFF) to recover the information lost in extremely low-bit quantization. The proposed OFF can incorporate semantic information and is insensitive to outliers. At the core of OFF is maximizing the mutual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsKnowledge Distillation