Conceptors for Semantic Steering
Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai, Lyle Ungar, Sharath Chandra Guntuku, Neville Ryant, Jo\~ao Sedoc

TL;DR
This paper introduces conceptors, a geometric approach to steering large language models by preserving full concept subspaces, improving control and safety over traditional single-direction methods.
Contribution
It proposes the use of conceptors as multidimensional projection matrices for semantic steering, with a geometric analysis and a new diagnostic for concept separability.
Findings
Conceptor subspaces strictly include single-vector baselines.
Conceptor quota predicts concept separability with high correlation.
Conceptors outperform additive baselines in multi-dimensional subspace scenarios.
Abstract
Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
