Sparsification and Reconstruction from the Perspective of Representation Geometry

Wenjie Sun; Bingzhe Wu; Zhile Yang; Chengke Wu

arXiv:2505.22506·cs.LG·May 29, 2025

Sparsification and Reconstruction from the Perspective of Representation Geometry

Wenjie Sun, Bingzhe Wu, Zhile Yang, Chengke Wu

PDF

Open Access

TL;DR

This paper explores how sparse autoencoders organize language model representations through geometric analysis, revealing their impact on feature disentanglement and reconstruction, and providing insights for interpretability tools.

Contribution

It introduces the SAEMA framework to analyze the geometric structure of sparse representations and links representational separability to reconstruction performance.

Findings

01

Stratified structure of representations validated by rank variability of SSPSD matrices.

02

Sparse encoding amplifies inter-feature distinctions and increases dimensionality.

03

Separable global representations causally improve reconstruction quality.

Abstract

Sparse Autoencoders (SAEs) have emerged as a predominant tool in mechanistic interpretability, aiming to identify interpretable monosemantic features. However, how does sparse encoding organize the representations of activation vector from language models? What is the relationship between this organizational paradigm and feature disentanglement as well as reconstruction performance? To address these questions, we propose the SAEMA, which validates the stratified structure of the representation by observing the variability of the rank of the symmetric semipositive definite (SSPD) matrix corresponding to the modal tensor unfolded along the latent tensor with the level of noise added to the residual stream. To systematically investigate how sparse encoding alters representational structures, we define local and global representations, demonstrating that they amplify inter-feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks