Sparse Crosscoders for diffing MoEs and Dense models
Marmik Chaudhari, Nishkal Hundia, Idhant Gulati

TL;DR
This paper compares the internal representations of sparse Mixture of Experts (MoE) models and dense models, revealing that MoEs develop more specialized and focused features, while dense models have broader, more general features.
Contribution
It introduces a systematic method using crosscoders to analyze and compare MoE and dense model internals, highlighting differences in feature organization and specialization.
Findings
MoEs learn fewer unique features than dense models.
MoEs have higher activation density in their features.
Dense models distribute information across more general features.
Abstract
Mixture of Experts (MoE) achieve parameter-efficient scaling through sparse expert routing, yet their internal representations remain poorly understood compared to dense models. We present a systematic comparison of MoE and dense model internals using crosscoders, a variant of sparse autoencoders, that jointly models multiple activation spaces. We train 5-layer dense and MoEs (equal active parameters) on 1B tokens across code, scientific text, and english stories. Using BatchTopK crosscoders with explicitly designated shared features, we achieve fractional variance explained and uncover concrete differences in feature organization. The MoE learns significantly fewer unique features compared to the dense model. MoE-specific features also exhibit higher activation density than shared features, whereas dense-specific features show lower density. Our analysis reveals that MoEs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Topic Modeling
