Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis   of Head and Prompt Tuning

Colin Wei; Sang Michael Xie; Tengyu Ma

arXiv:2106.09226·cs.LG·April 22, 2022·33 cites

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Colin Wei, Sang Michael Xie, Tengyu Ma

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides a theoretical framework linking pretraining and downstream NLP tasks using latent variable models, analyzing head and prompt tuning, and demonstrating conditions under which these methods succeed, supported by synthetic experiments.

Contribution

It introduces a generative model-based analysis of head and prompt tuning, revealing conditions for successful downstream task adaptation and comparing their effectiveness.

Findings

01

Head tuning works under certain non-degeneracy conditions.

02

Prompt tuning requires weaker conditions for guarantees.

03

Memory-augmented models recover task information more effectively.

Abstract

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sangmichaelxie/pretraining_analysis
pytorchOfficial

Videos

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis