Compositional Steering of Large Language Models with Steering Tokens
Gorjan Radevski, Kiril Gashteovski, Giwon Hong, Carolin Lawrence, Goran Glava\v{s}

TL;DR
This paper introduces compositional steering tokens that enable simultaneous multi-behavior control of large language models, improving zero-shot composition and combining effectively with natural language instructions.
Contribution
It proposes a novel method for multi-behavior steering using dedicated tokens embedded via self-distillation, enabling better zero-shot compositional control of LLMs.
Findings
Steering tokens outperform existing methods in multi-behavior control.
The approach generalizes well to unseen behavior combinations.
Combining steering tokens with natural language instructions yields further improvements.
Abstract
Deploying LLMs in real-world applications requires controllable output that satisfies multiple desiderata at the same time. While existing work extensively addresses LLM steering for a single behavior, \textit{compositional steering} -- i.e., steering LLMs simultaneously towards multiple behaviors -- remains an underexplored problem. In this work, we propose \emph{compositional steering tokens} for multi-behavior steering. We first embed individual behaviors, expressed as natural language instructions, into dedicated tokens via self-distillation. Contrary to most prior work, which operates in the activation space, our behavior steers live in the space of input tokens, enabling more effective zero-shot composition. We then train a dedicated \textit{composition token} on pairs of behaviors and show that it successfully captures the notion of composition: it generalizes well to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
