Multi-property Steering of Large Language Models with Dynamic Activation   Composition

Daniel Scalena; Gabriele Sarti; Malvina Nissim

arXiv:2406.17563·cs.CL·December 2, 2024

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Daniel Scalena, Gabriele Sarti, Malvina Nissim

PDF

Open Access 1 Repo

TL;DR

This paper introduces Dynamic Activation Composition, an information-theoretic method for multi-property steering of large language models, which adaptively modulates conditioning properties during generation to improve robustness and fluency.

Contribution

It proposes a novel dynamic activation composition technique that effectively manages multiple properties during language model generation, addressing limitations of previous static methods.

Findings

01

Successfully maintains high conditioning for multiple properties

02

Minimizes impact on generation fluency

03

Outperforms static activation steering methods

Abstract

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielsc4/dynamic-activation-composition
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques