Binary and Ternary Natural Language Generation

Zechun Liu; Barlas Oguz; Aasish Pappu; Yangyang Shi; Raghuraman; Krishnamoorthi

arXiv:2306.01841·cs.CL·June 6, 2023·1 cites

Binary and Ternary Natural Language Generation

Zechun Liu, Barlas Oguz, Aasish Pappu, Yangyang Shi, Raghuraman, Krishnamoorthi

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first ternary and binary transformer models for text summarization and translation, achieving competitive performance with significantly higher efficiency through novel quantization techniques.

Contribution

It presents a new approach combining statistics-based weight quantization and elastic activation quantization to enable effective training of ternary and binary transformers.

Findings

01

Ternary BART achieves 41 R1 score on CNN/DailyMail, close to full model.

02

Binary model achieves a BLEU score of 35.6 on WMT16 En-Ro.

03

Models outperform some 8-bit weight models in the literature.

Abstract

Ternary and binary neural networks enable multiplication-free computation and promise multiple orders of magnitude efficiency gains over full-precision networks if implemented on specialized hardware. However, since both the parameter and the output space are highly discretized, such networks have proven very difficult to optimize. The difficulties are compounded for the class of transformer text generation models due to the sensitivity of the attention operation to quantization and the noise-compounding effects of autoregressive decoding in the high-cardinality output space. We approach the problem with a mix of statistics-based quantization for the weights and elastic quantization of the activations and demonstrate the first ternary and binary transformer models on the downstream tasks of summarization and machine translation. Our ternary BART base achieves an R1 score of 41 on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/ternary_binary_transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Residual Connection