Slimmable ConvNeXt: Width-Adaptive Inference for Efficient Multi-Device Deployment
Janek Haberer, Jon Eike Wilhelm, Olaf Landsiedel

TL;DR
Slimmable ConvNeXt introduces a width-adaptive inference method for vision models, enabling efficient multi-device deployment by training a single model with multiple subnetworks, outperforming prior CNN and ViT approaches on ImageNet-1k.
Contribution
The paper demonstrates that ConvNeXt's design allows for effective channel-width slimming without switchable batch normalization, simplifying training and improving accuracy across different compute budgets.
Findings
Achieves 80.8% top-1 accuracy at 4.5 GMACs on ImageNet-1k.
Outperforms HydraViT and other models at comparable compute levels.
Scaling to ConvNeXt-B yields 82.8% accuracy at 15.35 GMACs.
Abstract
Deploying vision models across devices with varying resource constraints, or even on a single device where available compute fluctuates due to battery state, thermal throttling, or latency deadlines, typically requires training and maintaining separate models. Width-adaptive inference addresses this by training a single set of shared weights containing multiple nested subnetworks of increasing capacity, but prior CNN-based approaches required switchable batch normalization, while recent scalable methods have focused on Vision Transformers. We present Slimmable ConvNeXt, which shows that ConvNeXt's modern design, specifically LayerNorm and inverted bottlenecks, makes it particularly suited for channel-width slimming, eliminating the normalization overhead of classical slimmable networks and producing a simpler training pipeline than both prior CNN and ViT approaches. On ImageNet-1k,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
