Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained Language Models For Classification Tasks
Haoran Yang, Piji Li, Wai Lam

TL;DR
This paper proposes a simple, parameter-efficient method for classification that directly uses integrated hidden states from pretrained language models, achieving comparable results to prompt tuning with fewer parameters and less training time.
Contribution
It introduces a novel tuning approach that leverages integrated hidden states with minimal trainable parameters, bypassing complex nonlinear transformations.
Findings
Achieves comparable performance to prompt tuning methods.
Uses fewer parameters and less training time.
Validates that hidden states contain sufficient information for classification.
Abstract
Parameter-efficient tuning aims to distill knowledge for downstream tasks by optimizing a few introduced parameters while freezing the pretrained language models (PLMs). Continuous prompt tuning which prepends a few trainable vectors to the embeddings of input is one of these methods and has drawn much attention due to its effectiveness and efficiency. This family of methods can be illustrated as exerting nonlinear transformations of hidden states inside PLMs. However, a natural question is ignored: can the hidden states be directly used for classification without changing them? In this paper, we aim to answer this question by proposing a simple tuning method which only introduces three trainable vectors. Firstly, we integrate all layers hidden states using the introduced vectors. And then, we input the integrated hidden state(s) to a task-specific linear classifier to predict…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo
