Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache

TL;DR
This paper introduces a novel method to extract efficient Mixture-of-Experts subnetworks from pretrained Vision Transformers by clustering activation patterns, enabling near-original performance with reduced computational costs.
Contribution
The proposed approach constructs MoE variants from pretrained models without retraining, using clustering to identify expert subnetworks, significantly reducing resources needed.
Findings
Achieves 98% of original accuracy with minimal fine-tuning
Reduces MACs by up to 36% and model size by 32%
Experts perform well out-of-the-box on ImageNet-1k
Abstract
Vision Transformers have emerged as the state-of-the-art models in various Computer Vision tasks, but their high computational and resource demands pose significant challenges. While Mixture-of-Experts (MoE) can make these models more efficient, they often require costly retraining or even training from scratch. Recent developments aim to reduce these computational costs by leveraging pretrained networks. These have been shown to produce sparse activation patterns in the Multi-Layer Perceptrons (MLPs) of the encoder blocks, allowing for conditional activation of only relevant subnetworks for each sample. Building on this idea, we propose a new method to construct MoE variants from pretrained models. Our approach extracts expert subnetworks from the model's MLP layers post-training in two phases. First, we cluster output activations to identify distinct activation patterns. In the second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Expert finding and Q&A systems · Data-Driven Disease Surveillance
MethodsMixture of Experts
