Style Vectors for Steering Generative Large Language Model

Kai Konen; Sophie Jentzsch; Diaoul\'e Diallo; Peer Sch\"utt; Oliver; Bensch; Roxanne El Baff; Dominik Opitz; Tobias Hecking

arXiv:2402.01618·cs.CL·February 5, 2024·1 cites

Style Vectors for Steering Generative Large Language Model

Kai Konen, Sophie Jentzsch, Diaoul\'e Diallo, Peer Sch\"utt, Oliver, Bensch, Roxanne El Baff, Dominik Opitz, Tobias Hecking

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for steering large language models' output styles by adding style vectors to hidden layer activations, enabling nuanced and parameterizable style control without complex training.

Contribution

It presents a simple activation engineering approach to compute style vectors from recorded activations, offering an effective alternative to prompt engineering for style control in LLMs.

Findings

01

Style vectors effectively influence generated text style.

02

Activation-based style control is nuanced and parameterizable.

03

Method outperforms prompt engineering in style steering.

Abstract

This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activations of hidden layers during text generation. We show that style vectors can be simply computed from recorded layer activations for input texts in a specific style in contrast to more complex training-based approaches. Through a series of experiments, we demonstrate the effectiveness of activation engineering using such style vectors to influence the style of generated text in a nuanced and parameterisable way, distinguishing it from prompt engineering. The presented research constitutes a significant step towards developing more adaptive and effective AI-empowered interactive systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dlr-sc/style-vectors-for-steering-llms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques