Mixture-of-Experts as Soft Clustering: A Dual Jacobian-PCA Spectral Geometry Perspective
Feilong Liu

TL;DR
This paper provides a geometric analysis of Mixture-of-Experts architectures, showing how routing strategies influence local function sensitivity and representation diversity, with implications for model efficiency and interpretability.
Contribution
It introduces a Dual Jacobian-PCA spectral probe to analyze MoE geometry, revealing how routing affects local sensitivity and representation distribution.
Findings
MoE routing reduces local Jacobian singular values and spectral decay.
Expert representations have higher effective rank with MoE routing.
Low overlap among expert Jacobians suggests specialized transformations.
Abstract
Mixture-of-Experts (MoE) architectures are widely used for efficiency and conditional computation, but their effect on the geometry of learned functions and representations remains poorly understood. We study MoEs through a geometric lens, interpreting routing as soft partitioning into overlapping expert-local charts. We introduce a Dual Jacobian-PCA spectral probe that analyzes local function geometry via Jacobian singular value spectra and representation geometry via weighted PCA of routed hidden states. Using a controlled MLP-MoE setting with exact Jacobian computation, we compare dense, Top-k, and fully soft routing under matched capacity. Across random seeds, MoE routing consistently reduces local sensitivity: expert-local Jacobians show smaller leading singular values and faster spectral decay than dense baselines. Weighted PCA reveals that expert-local representations distribute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Ferroelectric and Negative Capacitance Devices
