Word Embeddings Are Steers for Language Models

Chi Han; Jialiang Xu; Manling Li; Yi Fung; Chenkai Sun; Nan Jiang,; Tarek Abdelzaher; Heng Ji

arXiv:2305.12798·cs.CL·June 7, 2024·2 cites

Word Embeddings Are Steers for Language Models

Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang,, Tarek Abdelzaher, Heng Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces LM-Steers, a method to control language model generation styles through linear transformations of output word embeddings, demonstrating interpretability, transferability, and effectiveness in style control tasks.

Contribution

The work reveals that linear transformations of output word embeddings can steer language model styles, providing a new interpretable and transferable control mechanism.

Findings

01

LM-Steers exist in all sizes of language models.

02

Learning LM-Steers requires only 0.2% of the original model parameters.

03

LM-Steers achieve competitive results in style control tasks.

Abstract

Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

glaciohound/lm-steer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques

MethodsBalanced Selection