Toward a Flexible Framework for Linear Representation Hypothesis Using   Maximum Likelihood Estimation

Trung Nguyen; Yan Leng

arXiv:2502.16385·cs.LG·February 25, 2025

Toward a Flexible Framework for Linear Representation Hypothesis Using Maximum Likelihood Estimation

Trung Nguyen, Yan Leng

PDF

Open Access

TL;DR

This paper introduces SAND, a new method using maximum likelihood estimation and activation differences to derive concept directions in LLMs, overcoming previous limitations and improving flexibility and performance.

Contribution

We propose a novel MLE-based approach, SAND, that models activation differences as vMF distributions to compute concept directions without relying on unembedding or single-token pairs.

Findings

01

SAND outperforms previous methods in activation engineering tasks.

02

The approach is more flexible and applicable to complex, context-dependent concepts.

03

Experiments show improved monitoring and manipulation of LLM representations.

Abstract

Linear representation hypothesis posits that high-level concepts are encoded as linear directions in the representation spaces of LLMs. Park et al. (2024) formalize this notion by unifying multiple interpretations of linear representation, such as 1-dimensional subspace representation and interventions, using a causal inner product. However, their framework relies on single-token counterfactual pairs and cannot handle ambiguous contrasting pairs, limiting its applicability to complex or context-dependent concepts. We introduce a new notion of binary concepts as unit vectors in a canonical representation space, and utilize LLMs' (neural) activation differences along with maximum likelihood estimation (MLE) to compute concept directions (i.e., steering vectors). Our method, Sum of Activation-base Normalized Difference (SAND), formalizes the use of activation differences modeled as samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications

MethodsLLaMA