What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models
Guimin Hu, Meng Li, Qiwei Peng, Lijie Hu, Boyan Xu, Ruichu Cai

TL;DR
This paper investigates expert activation patterns in MoE language models, identifying domain and driver experts, and demonstrates how understanding these can improve model interpretability and performance across multiple domains.
Contribution
It introduces entropy-based and causal-effect metrics to identify domain and driver experts, revealing their roles and influence in MoE models, and shows how expert weight adjustments enhance performance.
Findings
Some experts show clear domain preferences.
Certain experts have a strong causal influence on performance.
Adjusting expert weights improves model performance across domains.
Abstract
Most interpretability work focuses on layer- or neuron-level mechanisms in Transformers, leaving expert-level behavior in MoE LLMs underexplored. Motivated by functional specialization in the human brain, we analyze expert activation by distinguishing domain and driver experts. In this work, we study expert activation in MoE models across three public domains and address two key questions: (1) which experts are activated, and whether certain expert types exhibit consistent activation patterns; and (2) how tokens are associated with and trigger the activation of specific experts. To answer these questions, we introduce entropy-based and causal-effect metrics to assess whether an expert is strongly favored for a particular domain, and how strongly expert activation contributes causally to the model's output, thus identify domain and driver experts, respectively. Furthermore, we explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
