Concept Heterogeneity-aware Representation Steering

Laziz U. Abdullaev; Noelle Y. L. Wong; Ryan T. Z. Lee; Shiqi Jiang; Khoi N. M. Nguyen; Tan M. Nguyen

arXiv:2603.02237·cs.LG·March 4, 2026

Concept Heterogeneity-aware Representation Steering

Laziz U. Abdullaev, Noelle Y. L. Wong, Ryan T. Z. Lee, Shiqi Jiang, Khoi N. M. Nguyen, Tan M. Nguyen

PDF

Open Access

TL;DR

This paper introduces CHaRS, a novel method for representation steering in large language models that accounts for concept heterogeneity by modeling representations as Gaussian mixtures and using optimal transport for input-dependent control.

Contribution

The paper proposes Concept Heterogeneity-aware Representation Steering (CHaRS), a new approach that models representation heterogeneity with Gaussian mixtures and applies optimal transport for more effective steering.

Findings

01

CHaRS outperforms global steering in various experimental settings.

02

Modeling representation heterogeneity improves control precision.

03

Input-dependent steering maps enhance behavioral control.

Abstract

Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained via difference-in-means over contrastive datasets. This approach implicitly assumes that the target concept is homogeneously represented across the embedding space. In practice, however, LLM representations can be highly non-homogeneous, exhibiting clustered, context-dependent structure, which renders global steering directions brittle. In this work, we view representation steering through the lens of optimal transport (OT), noting that standard difference-in-means steering implicitly corresponds to the OT map between two unimodal Gaussian distributions with identical covariance, yielding a global translation. To relax this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Bayesian Methods and Mixture Models · Domain Adaptation and Few-Shot Learning