TL;DR
This paper explores the integration of sparse mixture-of-experts layers into CNNs for semantic segmentation, demonstrating architecture-dependent improvements with minimal overhead and providing empirical insights into their design.
Contribution
It introduces a coarse, patch-wise sparse MoE formulation for CNNs in semantic segmentation and analyzes how architectural choices influence routing and specialization.
Findings
Up to +3.9 mIoU improvement on Cityscapes and BDD100K datasets.
Sparse MoE layers achieve these improvements with little additional computational cost.
Design choices significantly affect routing dynamics and expert specialization.
Abstract
Sparse mixture-of-experts (MoE) layers have been shown to substantially increase model capacity without a proportional increase in computational cost and are widely used in transformer architectures, where they typically replace feed-forward network blocks. In contrast, integrating sparse MoE layers into convolutional neural networks (CNNs) remains inconsistent, with most prior work focusing on fine-grained MoEs operating at the filter or channel levels. In this work, we investigate a coarser, patch-wise formulation of sparse MoE layers for semantic segmentation, where local regions are routed to a small subset of convolutional experts. Through experiments on the Cityscapes and BDD100K datasets using encoder-decoder and backbone-based CNNs, we conduct a design analysis to assess how architectural choices affect routing dynamics and expert specialization. Our results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
