Extracting Latent Steering Vectors from Pretrained Language Models

Nishant Subramani; Nivedita Suresh; Matthew E. Peters

arXiv:2205.05124·cs.CL·May 12, 2022·1 cites

Extracting Latent Steering Vectors from Pretrained Language Models

Nishant Subramani, Nivedita Suresh, Matthew E. Peters

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that pretrained language models contain latent steering vectors that can be extracted and manipulated to control text generation, enabling high-quality sentence targeting, sentiment transfer, and similarity assessment without fine-tuning.

Contribution

The authors introduce a novel method to extract and utilize latent steering vectors from pretrained language models for controllable text generation without additional training.

Findings

01

Steering vectors enable target sentence generation with >99 BLEU score.

02

Vector arithmetic allows unsupervised sentiment transfer with competitive performance.

03

Distances between steering vectors correlate with sentence similarity, outperforming pooled hidden states.

Abstract

Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vectors directly from pretrained language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly (> 99 BLEU) for English sentences from a variety of domains. We show that vector arithmetic can be used for unsupervised sentiment transfer on the Yelp sentiment benchmark, with performance comparable to models tailored to this task. We find that distances between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nishantsubramani/steering_vectors
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining