TL;DR
This paper introduces a learnable weight repetition strategy for CNNs that enhances performance without increasing parameter count, demonstrating that weight sharing significantly contributes to improvements in various CNN architectures.
Contribution
The paper proposes a novel learnable weight repetition method for CNNs, showing it can boost performance and explain part of the gains from group-equivariant CNNs without increasing parameters.
Findings
Small rescaled networks achieve performance comparable to larger ones with fewer parameters.
Learnable weight sharing accounts for a significant portion of improvements in group-equivariant CNNs.
Up to 40% of the gains in rotation-equivariant CNNs may be due to learned weight repetition.
Abstract
Deeper and wider CNNs are known to provide improved performance for deep learning tasks. However, most such networks have poor performance gain per parameter increase. In this paper, we investigate whether the gain observed in deeper models is purely due to the addition of more optimization parameters or whether the physical size of the network as well plays a role. Further, we present a novel rescaling strategy for CNNs based on learnable repetition of its parameters. Based on this strategy, we rescale CNNs without changing their parameter count, and show that learnable sharing of weights itself can provide significant boost in the performance of any given model without changing its parameter count. We show that small base networks when rescaled, can provide performance comparable to deeper networks with as low as 6% of optimization parameters of the deeper one. The relevance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
