Multi-Narrow Transformation as a Single-Model Ensemble: Boundary Conditions, Mechanisms, and Failure Modes
Tatsuhito Hasegawa, Taisei Tanaka

TL;DR
This paper explores how partitioning a CNN into multiple narrow, independent branches affects performance, revealing that data availability influences whether this approach improves generalization.
Contribution
It introduces the Multi-Narrow transformation, systematically compares it with single-wide models, and provides insights into capacity allocation based on data regimes.
Findings
Multi-Narrow models outperform baseline in low-data settings.
High-MN models learn more diverse features, aiding generalization.
In data-rich regimes, wide models are more effective.
Abstract
Single-model ensembles (SMEs) have attracted attention as a way to approximate some of the benefits of deep ensembles within a single network. However, under an approximately matched parameter budget, it remains unclear whether model capacity should be concentrated in a single wide pathway or redistributed into many narrow and independent members. We investigate this question through the Multi-Narrow (MN) transformation, which converts a baseline CNN into an SME of narrow, path-wise independent branches while approximately preserving the dominant parameter budget. We systematically compare Single-Wide and Multi-Narrow configurations across different training-data regimes, architectures, and datasets. The results show that the effectiveness of MN is strongly data-dependent: weakly partitioned or baseline-wide models are preferable in data-rich settings, whereas highly partitioned MN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
