AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of   Large-Scale Pre-Trained Language Models

Se Jung Kwon; Jeonghoon Kim; Jeongin Bae; Kang Min Yoo; Jin-Hwa Kim,; Baeseong Park; Byeongwook Kim; Jung-Woo Ha; Nako Sung; Dongsoo Lee

arXiv:2210.03858·cs.LG·October 11, 2022·1 cites

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

Se Jung Kwon, Jeonghoon Kim, Jeongin Bae, Kang Min Yoo, Jin-Hwa Kim,, Baeseong Park, Byeongwook Kim, Jung-Woo Ha, Nako Sung, Dongsoo Lee

PDF

Open Access

TL;DR

AlphaTuning introduces a method combining post-training quantization and selective fine-tuning of scaling factors, enabling efficient adaptation of large language models with significant compression and reduced training parameters.

Contribution

It proposes a novel approach that integrates quantization with parameter-efficient fine-tuning, achieving competitive performance with much smaller models and fewer trainable parameters.

Findings

01

Achieves over 10x compression with 4-bit quantization.

02

Reduces trainable parameters by over 1,000x.

03

Performs competitively with full fine-tuning on various tasks.

Abstract

There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. During the adaptation phase, the binary values are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Speech Recognition and Synthesis

MethodsAttention Is All You Need · OPT · Linear Layer · Byte Pair Encoding · Discriminative Fine-Tuning · Layer Normalization · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Dropout