Style Vectors for Steering Generative Large Language Model
Kai Konen, Sophie Jentzsch, Diaoul\'e Diallo, Peer Sch\"utt, Oliver, Bensch, Roxanne El Baff, Dominik Opitz, Tobias Hecking

TL;DR
This paper introduces a method for steering large language models' output styles by adding style vectors to hidden layer activations, enabling nuanced and parameterizable style control without complex training.
Contribution
It presents a simple activation engineering approach to compute style vectors from recorded activations, offering an effective alternative to prompt engineering for style control in LLMs.
Findings
Style vectors effectively influence generated text style.
Activation-based style control is nuanced and parameterizable.
Method outperforms prompt engineering in style steering.
Abstract
This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activations of hidden layers during text generation. We show that style vectors can be simply computed from recorded layer activations for input texts in a specific style in contrast to more complex training-based approaches. Through a series of experiments, we demonstrate the effectiveness of activation engineering using such style vectors to influence the style of generated text in a nuanced and parameterisable way, distinguishing it from prompt engineering. The presented research constitutes a significant step towards developing more adaptive and effective AI-empowered interactive systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
