The Knowledge Microscope: Features as Better Analytical Lenses than   Neurons

Yuheng Chen; Pengfei Cao; Kang Liu; Jun Zhao

arXiv:2502.12483·cs.CL·February 28, 2025

The Knowledge Microscope: Features as Better Analytical Lenses than Neurons

Yuheng Chen, Pengfei Cao, Kang Liu, Jun Zhao

PDF

Open Access 1 Video

TL;DR

This paper proposes using features derived from Sparse Autoencoders as superior analytical units over neurons for understanding factual knowledge in language models, offering better interpretability, monosemanticity, and privacy protection.

Contribution

The study introduces features as an alternative to neurons for analyzing language models, demonstrating their advantages in interpretability, semantic clarity, and privacy preservation.

Findings

01

Features have stronger influence on knowledge expression.

02

Features show enhanced monosemanticity with clear activation patterns.

03

FeatureEdit effectively erases privacy-sensitive information.

Abstract

Previous studies primarily utilize MLP neurons as units of analysis for understanding the mechanisms of factual knowledge in Language Models (LMs); however, neurons suffer from polysemanticity, leading to limited knowledge expression and poor interpretability. In this paper, we first conduct preliminary experiments to validate that Sparse Autoencoders (SAE) can effectively decompose neurons into features, which serve as alternative analytical units. With this established, our core findings reveal three key advantages of features over neurons: (1) Features exhibit stronger influence on knowledge expression and superior interpretability. (2) Features demonstrate enhanced monosemanticity, showing distinct activation patterns between related and unrelated facts. (3) Features achieve better privacy protection than neurons, demonstrated through our proposed FeatureEdit method, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Knowledge Microscope: Features as Better Analytical Lenses than Neurons· underline

Taxonomy

TopicsNeural Networks and Applications · Image Processing Techniques and Applications