Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability
Svetlana Pavlitska, Christian Hubschneider, Lukas Struppek, J., Marius Z\"ollner

TL;DR
This paper explores the application of sparsely-gated Mixture-of-Expert layers to CNNs in computer vision, demonstrating improved interpretability and specialization of experts in recognizing different input sub-domains and object sizes.
Contribution
It introduces methods for stabilizing MoE training in CNNs and shows how experts specialize in different visual domains and object sizes, enhancing interpretability and performance.
Findings
Experts specialize in input sub-domains like flowers or animals.
Hard constraints increase model performance and generalization.
Soft constraints improve expert utilization and specialization.
Abstract
Sparsely-gated Mixture of Expert (MoE) layers have been recently successfully applied for scaling large transformers, especially for language modeling tasks. An intriguing side effect of sparse MoE layers is that they convey inherent interpretability to a model via natural expert specialization. In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability. To stabilize MoE training, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, while hard constraints maintain more generalized experts and increase overall model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsFeature Pyramid Network · 1x1 Convolution · Convolution · Focal Loss · RetinaNet
