Compositional Steering of Large Language Models with Steering Tokens

Gorjan Radevski; Kiril Gashteovski; Giwon Hong; Carolin Lawrence; Goran Glava\v{s}

arXiv:2601.05062·cs.CL·April 21, 2026

Compositional Steering of Large Language Models with Steering Tokens

Gorjan Radevski, Kiril Gashteovski, Giwon Hong, Carolin Lawrence, Goran Glava\v{s}

PDF

TL;DR

This paper introduces compositional steering tokens that enable simultaneous multi-behavior control of large language models, improving zero-shot composition and combining effectively with natural language instructions.

Contribution

It proposes a novel method for multi-behavior steering using dedicated tokens embedded via self-distillation, enabling better zero-shot compositional control of LLMs.

Findings

01

Steering tokens outperform existing methods in multi-behavior control.

02

The approach generalizes well to unseen behavior combinations.

03

Combining steering tokens with natural language instructions yields further improvements.

Abstract

Deploying LLMs in real-world applications requires controllable output that satisfies multiple desiderata at the same time. While existing work extensively addresses LLM steering for a single behavior, \textit{compositional steering} -- i.e., steering LLMs simultaneously towards multiple behaviors -- remains an underexplored problem. In this work, we propose \emph{compositional steering tokens} for multi-behavior steering. We first embed individual behaviors, expressed as natural language instructions, into dedicated tokens via self-distillation. Contrary to most prior work, which operates in the activation space, our behavior steers live in the space of input tokens, enabling more effective zero-shot composition. We then train a dedicated \textit{composition token} on pairs of behaviors and show that it successfully captures the notion of composition: it generalizes well to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.