Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity
Yuxing Gan, Ziyu Lei

TL;DR
This paper introduces CDSP-MoE, a novel mixture-of-experts framework that dynamically prunes conflicting pathways using gradient conflict signals, leading to emergent modularity and improved performance without explicit task instructions.
Contribution
It proposes a new paradigm for MoE architectures that uses gradient conflict as a supervisory signal to evolve interpretable modular structures within a shared parameter space.
Findings
Achieves robust content-driven routing without task labels.
Maintains semantic specialization under blind inference.
Spontaneously prunes conflicting pathways for emergent modularity.
Abstract
Mixture-of-Experts (MoE) architectures achieve parameter efficiency through conditional computation, yet contemporary designs suffer from two fundamental limitations: structural parameter isolation that causes catastrophic forgetting, and instruction-overfitting that degrades performance in instruction-free scenarios. We propose CDSP-MoE (Conflict-Driven Subspace Pruning MoE), a framework that addresses these issues through a paradigm shift from isolated expert containers to dynamic expert instantiation within a shared physical subspace. Grounded in the Universal Weight Subspace Hypothesis, CDSP-MoE maintains a super-complete parameter backbone where logical experts are carved out via learnable topology masks. Unlike prior work that uses gradient conflict for token reassignment or optimization surgery, we leverage it as a structural supervisory signal: a Lagged Gradient Game penalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Ferroelectric and Negative Capacitance Devices
