The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

Jeremy Herbst; Stefan Wermter; Jae Hee Lee

arXiv:2604.02178·cs.CL·May 19, 2026

The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

Jeremy Herbst, Stefan Wermter, Jae Hee Lee

PDF

1 Repo

TL;DR

This paper demonstrates that Mixture-of-Experts models are inherently more interpretable at the expert level due to their monosemantic neurons and experts, facilitating better understanding of their linguistic and semantic functions.

Contribution

It introduces a novel expert-level analysis approach showing MoE experts are specialized in linguistic and semantic tasks, improving interpretability of large language models.

Findings

01

MoE experts are less polysemantic than dense neurons.

02

Sparsity encourages monosemanticity in neurons and experts.

03

Experts function as fine-grained task specialists rather than broad domain experts.

Abstract

Mixture-of-Experts (MoE) architectures have become the dominant choice for scaling Large Language Models (LLMs), activating only a subset of parameters per token. While MoE architectures are primarily adopted for computational efficiency, it remains an open question whether their sparsity makes them inherently easier to interpret than dense feed-forward networks (FFNs). We compare MoE experts and dense FFNs using $k$ -sparse probing and find that expert neurons are consistently less polysemantic, with the gap widening as routing becomes sparser. This suggests that sparsity pressures both individual neurons and entire experts toward monosemanticity. Leveraging this finding, we zoom out from the neuron to the expert level as a more effective unit of analysis. We validate this approach by automatically interpreting hundreds of experts. This analysis allows us to resolve the debate on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jerryy33/MoE_analysis
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.