Hidden State Variability of Pretrained Language Models Can Guide   Computation Reduction for Transfer Learning

Shuo Xie; Jiahao Qiu; Ankita Pasad; Li Du; Qing Qu; Hongyuan Mei

arXiv:2210.10041·cs.CL·October 20, 2022

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

Shuo Xie, Jiahao Qiu, Ankita Pasad, Li Du, Qing Qu, Hongyuan Mei

PDF

Open Access

TL;DR

This paper introduces a simple, efficient method to select which layers of a pretrained language model to adapt during transfer learning, reducing computation without sacrificing performance.

Contribution

It proposes a layer selection technique based on hidden state variability, enabling effective transfer learning with fewer layers and less computation.

Findings

01

Layer selection based on variability improves transfer performance.

02

Method matches full fine-tuning performance with fewer layers.

03

Approach is robust to data imbalance and scarcity.

Abstract

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on which subset of the layers to adapt and where to place the classifier. The goal is to reduce the computation cost of transfer learning methods (e.g. fine-tuning or adapter-tuning) without sacrificing its performance. We propose to select layers based on the variability of their hidden states given a task-specific corpus. We say a layer is already "well-specialized" in a task if the within-class variability of its hidden states is low relative to the between-class variability. Our variability metric is cheap to compute and doesn't need any training or hyperparameter tuning. It is robust to data imbalance and data scarcity. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis