TL;DR
This paper proposes dividing a large neural network into smaller ones trained together to improve accuracy-efficiency trade-offs, outperforming wider networks with similar or fewer resources.
Contribution
Introducing a divide and co-training method that leverages multiple small networks to enhance performance and efficiency over single large networks.
Findings
Small networks achieve better ensemble performance than large ones.
Co-training increases diversity and learning among small networks.
Method improves accuracy-efficiency trade-offs across various architectures.
Abstract
The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
