TernaryLLM: Ternarized Large Language Model
Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad, Barsoum, Peisong Wang, Jian Cheng

TL;DR
TernaryLLM introduces a novel ternarization technique for large language models, reducing memory and computational costs while maintaining high performance through learnable scales, shifts, and a knowledge distillation method that preserves semantic information.
Contribution
The paper proposes Dual Learnable Ternarization and Outlier-Friendly Feature Knowledge Distillation, enabling effective extreme quantization of LLMs with improved accuracy and efficiency.
Findings
Outperforms previous low-bit quantization methods on text generation benchmarks.
Achieves 5.8 perplexity reduction on C4 dataset for LLaMA-3.
Improves zero-shot task accuracy by 8.2% over state-of-the-art.
Abstract
Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming from outliers in both weights and activations. In this work, observing asymmetric outliers and non-zero means in weights, we introduce Dual Learnable Ternarization (DLT), which enables both scales and shifts to be learnable. We also propose Outlier-Friendly Feature Knowledge Distillation (OFF) to recover the information lost in extremely low-bit quantization. The proposed OFF can incorporate semantic information and is insensitive to outliers. At the core of OFF is maximizing the mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsKnowledge Distillation
