Learning Distribution-Wise Control in Representation Space for Language Models

Chunyuan Deng; Ruidi Chang; Hanjie Chen

arXiv:2506.06686·cs.CL·June 10, 2025

Learning Distribution-Wise Control in Representation Space for Language Models

Chunyuan Deng, Ruidi Chang, Hanjie Chen

PDF

Open Access 1 Video

TL;DR

This paper introduces a distribution-wise intervention method for language models that improves control and robustness over high-level behaviors by adjusting the surrounding regions of concept subspaces, outperforming pointwise methods.

Contribution

It extends representation fine-tuning to the distribution level, enabling more comprehensive and effective control of language model behaviors.

Findings

01

Distribution-wise interventions outperform pointwise methods in benchmarks.

02

Larger standard deviations in interventions correlate with better performance.

03

Effective in early layers of language models.

Abstract

Interventions in language models (LMs) are applied strategically to steer model behavior during the forward pass. Learnable interventions, also known as representation fine-tuning, aim to apply pointwise control within the concept subspace and have proven effective in altering high-level behaviors. In this work, we extend this approach to the distribution level, enabling the model to learn not only pointwise transformations but also the surrounding regions of the concept subspace. We demonstrate that these methods perform effectively in early layers, with larger standard deviations correlating strongly with improved performance. Across eight commonsense reasoning and seven arithmetic reasoning benchmarks, our distribution-wise interventions consistently outperform pointwise interventions in controllability and robustness. These results illustrate that distribution-wise interventions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Distribution-wise Control in Representation Space for Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)