Parameter-Efficient Tuning Makes a Good Classification Head

Zhuoyi Yang; Ming Ding; Yanhui Guo; Qingsong Lv; Jie Tang

arXiv:2210.16771·cs.CL·March 29, 2023

Parameter-Efficient Tuning Makes a Good Classification Head

Zhuoyi Yang, Ming Ding, Yanhui Guo, Qingsong Lv, Jie Tang

PDF

Open Access 1 Repo

TL;DR

This paper shows that parameter-efficient tuning methods can create effective classification heads for pretrained models, leading to stable performance improvements across multiple NLP tasks without full finetuning.

Contribution

It introduces the idea that parameter-efficient tuning can produce good classification heads, reducing the need for full model finetuning and enhancing stability.

Findings

01

Pretrained classification heads via parameter-efficient tuning improve performance.

02

The approach is effective across 9 GLUE and SuperGLUE tasks.

03

Stable performance gains are achieved without full finetuning.

Abstract

In recent years, pretrained models revolutionized the paradigm of natural language understanding (NLU), where we append a randomly initialized classification head after the pretrained backbone, e.g. BERT, and finetune the whole model. As the pretrained backbone makes a major contribution to the improvement, we naturally expect a good pretrained classification head can also benefit the training. However, the final-layer output of the backbone, i.e. the input of the classification head, will change greatly during finetuning, making the usual head-only pretraining (LP-FT) ineffective. In this paper, we find that parameter-efficient tuning makes a good classification head, with which we can simply replace the randomly initialized heads for a stable performance gain. Our experiments demonstrate that the classification head jointly pretrained with parameter-efficient tuning consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thudm/efficient-head-finetuning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Dense Connections · Linear Layer · Layer Normalization · Residual Connection · Dropout