Conceptors for Semantic Steering

Ilias Triantafyllopoulos; Young-Min Cho; Ren Tao; Miranda Muqing Miao; Sunny Rai; Lyle Ungar; Sharath Chandra Guntuku; Neville Ryant; Jo\~ao Sedoc

arXiv:2605.04980·cs.LG·May 7, 2026

Conceptors for Semantic Steering

Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai, Lyle Ungar, Sharath Chandra Guntuku, Neville Ryant, Jo\~ao Sedoc

PDF

TL;DR

This paper introduces conceptors, a geometric approach to steering large language models by preserving full concept subspaces, improving control and safety over traditional single-direction methods.

Contribution

It proposes the use of conceptors as multidimensional projection matrices for semantic steering, with a geometric analysis and a new diagnostic for concept separability.

Findings

01

Conceptor subspaces strictly include single-vector baselines.

02

Conceptor quota predicts concept separability with high correlation.

03

Conceptors outperform additive baselines in multi-dimensional subspace scenarios.

Abstract

Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.