Token-Scaled Logit Distillation for Ternary Weight Generative Language   Models

Minsoo Kim; Sihwa Lee; Janghwan Lee; Sukjin Hong; Du-Seong Chang,; Wonyong Sung; Jungwook Choi

arXiv:2308.06744·cs.CL·December 5, 2023·2 cites

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang,, Wonyong Sung, Jungwook Choi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel token-scaled logit distillation method for quantizing large generative language models, achieving minimal accuracy loss and improved performance in reasoning and understanding tasks.

Contribution

It presents the first ternary weight quantization-aware training approach for large-scale GLMs with minimal perplexity degradation and enhanced task accuracy.

Findings

01

Less than 1.0 perplexity degradation in quantized models

02

Improved accuracy in common-sense QA and arithmetic reasoning

03

Effective knowledge distillation for generative language models

Abstract

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and achieves enhanced accuracy in tasks like common-sense QA and arithmetic reasoning as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiha-lab/tsld
pytorchOfficial

Videos

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsKnowledge Distillation