Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained   Language Models For Classification Tasks

Haoran Yang; Piji Li; Wai Lam

arXiv:2204.04596·cs.CL·April 14, 2022·1 cites

Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained Language Models For Classification Tasks

Haoran Yang, Piji Li, Wai Lam

PDF

Open Access

TL;DR

This paper proposes a simple, parameter-efficient method for classification that directly uses integrated hidden states from pretrained language models, achieving comparable results to prompt tuning with fewer parameters and less training time.

Contribution

It introduces a novel tuning approach that leverages integrated hidden states with minimal trainable parameters, bypassing complex nonlinear transformations.

Findings

01

Achieves comparable performance to prompt tuning methods.

02

Uses fewer parameters and less training time.

03

Validates that hidden states contain sufficient information for classification.

Abstract

Parameter-efficient tuning aims to distill knowledge for downstream tasks by optimizing a few introduced parameters while freezing the pretrained language models (PLMs). Continuous prompt tuning which prepends a few trainable vectors to the embeddings of input is one of these methods and has drawn much attention due to its effectiveness and efficiency. This family of methods can be illustrated as exerting nonlinear transformations of hidden states inside PLMs. However, a natural question is ignored: can the hidden states be directly used for classification without changing them? In this paper, we aim to answer this question by proposing a simple tuning method which only introduces three trainable vectors. Firstly, we integrate all layers hidden states using the introduced vectors. And then, we input the integrated hidden state(s) to a task-specific linear classifier to predict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo