Sparse Feature Coactivation Reveals Causal Semantic Modules in Large Language Models

Ruixuan Deng; Xiaoyang Hu; Miles Gilberti; Shane Storks; Aman Taxali; Mike Angstadt; Chandra Sripada; Joyce Chai

arXiv:2506.18141·cs.CL·April 21, 2026

Sparse Feature Coactivation Reveals Causal Semantic Modules in Large Language Models

Ruixuan Deng, Xiaoyang Hu, Miles Gilberti, Shane Storks, Aman Taxali, Mike Angstadt, Chandra Sripada, Joyce Chai

PDF

TL;DR

This paper uncovers modular semantic components in large language models using sparse autoencoder features, enabling targeted manipulation and revealing a layered organization of concepts and relations.

Contribution

It introduces a method to identify and manipulate semantic modules in LLMs, demonstrating their causal role and layered structure across model layers.

Findings

01

Ablating identified components alters model outputs predictably.

02

Amplifying components induces counterfactual responses.

03

Relation components are concentrated in later layers.

Abstract

We identify semantically coherent, context-consistent network components in large language models (LLMs) using coactivation of sparse autoencoder (SAE) features collected from just a handful of prompts. Focusing on concept-relation prediction tasks, we show that ablating these components for concepts (e.g., countries and words) and relations (e.g., capital city and translation language) changes model outputs in predictable ways, while amplifying these components induces counterfactual responses. Notably, composing relation and concept components yields compound counterfactual outputs. Further analysis reveals that while most concept components emerge from the very first layer, more abstract relation components are concentrated in later layers. Lastly, we show that extracted components more comprehensively capture concepts and relations than individual features while maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.