Mixture of Experts in Image Classification: What's the Sweet Spot?
Mathurin Videau, Alessandro Leite, Marc Schoenauer, Olivier Teytaud

TL;DR
This paper systematically analyzes the use of Mixture-of-Experts layers in image classification models, revealing optimal configurations for balancing performance and efficiency across different model sizes and datasets.
Contribution
It provides practical insights and heuristics for integrating MoE layers into vision models, including placement, number of experts, and routing strategies, based on extensive experiments.
Findings
Moderate parameter activation per sample offers best performance-efficiency trade-off.
MoE benefits are most significant for tiny and mid-sized models, less so for large models.
Simple linear routing performs best, with minimal gains from complex routing strategies.
Abstract
Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across domains. However, their application to image classification remains limited, often requiring billion-scale datasets to be competitive. In this work, we explore the integration of MoE layers into image classification architectures using open datasets. We conduct a systematic analysis across different MoE configurations and model scales. We find that moderate parameter activation per sample provides the best trade-off between performance and efficiency. However, as the number of activated parameters increases, the benefits of MoE diminish. Our analysis yields several practical insights for vision MoE design. First, MoE layers most effectively strengthen tiny and mid-sized models, while gains taper off for large-capacity networks and do not redefine state-of-the-art ImageNet performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging
MethodsMixture of Experts
