Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability

Svetlana Pavlitska; Christian Hubschneider; Lukas Struppek; J.; Marius Z\"ollner

arXiv:2204.10598·cs.CV·April 28, 2023·1 cites

Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability

Svetlana Pavlitska, Christian Hubschneider, Lukas Struppek, J., Marius Z\"ollner

PDF

Open Access

TL;DR

This paper explores the application of sparsely-gated Mixture-of-Expert layers to CNNs in computer vision, demonstrating improved interpretability and specialization of experts in recognizing different input sub-domains and object sizes.

Contribution

It introduces methods for stabilizing MoE training in CNNs and shows how experts specialize in different visual domains and object sizes, enhancing interpretability and performance.

Findings

01

Experts specialize in input sub-domains like flowers or animals.

02

Hard constraints increase model performance and generalization.

03

Soft constraints improve expert utilization and specialization.

Abstract

Sparsely-gated Mixture of Expert (MoE) layers have been recently successfully applied for scaling large transformers, especially for language modeling tasks. An intriguing side effect of sparse MoE layers is that they convey inherent interpretability to a model via natural expert specialization. In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability. To stabilize MoE training, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, while hard constraints maintain more generalized experts and increase overall model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsFeature Pyramid Network · 1x1 Convolution · Convolution · Focal Loss · RetinaNet