Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation

Svetlana Pavlitska; Haixi Fan; Konstantin Ditschuneit; J. Marius Z\"ollner

arXiv:2604.13761·cs.CV·April 16, 2026

Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation

Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, J. Marius Z\"ollner

PDF

1 Repo

TL;DR

This paper explores the integration of sparse mixture-of-experts layers into CNNs for semantic segmentation, demonstrating architecture-dependent improvements with minimal overhead and providing empirical insights into their design.

Contribution

It introduces a coarse, patch-wise sparse MoE formulation for CNNs in semantic segmentation and analyzes how architectural choices influence routing and specialization.

Findings

01

Up to +3.9 mIoU improvement on Cityscapes and BDD100K datasets.

02

Sparse MoE layers achieve these improvements with little additional computational cost.

03

Design choices significantly affect routing dynamics and expert specialization.

Abstract

Sparse mixture-of-experts (MoE) layers have been shown to substantially increase model capacity without a proportional increase in computational cost and are widely used in transformer architectures, where they typically replace feed-forward network blocks. In contrast, integrating sparse MoE layers into convolutional neural networks (CNNs) remains inconsistent, with most prior work focusing on fine-grained MoEs operating at the filter or channel levels. In this work, we investigate a coarser, patch-wise formulation of sparse MoE layers for semantic segmentation, where local regions are routed to a small subset of convolutional experts. Through experiments on the Cityscapes and BDD100K datasets using encoder-decoder and backbone-based CNNs, we conduct a design analysis to assess how architectural choices affect routing dynamics and expert specialization. Our results demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KASTEL-MobilityLab/moe-layers
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.