Learning Implicitly Recurrent CNNs Through Parameter Sharing
Pedro Savarese, Michael Maire

TL;DR
This paper presents a parameter sharing scheme for CNNs that creates hybrid recurrent-convolutional architectures, reducing parameters while maintaining accuracy and enabling implicit discovery of recurrent structures.
Contribution
The authors introduce a novel parameter sharing method that hybridizes CNNs and recurrent networks, achieving parameter efficiency and competitive accuracy with NAS-based architectures.
Findings
Significant parameter savings on image classification tasks.
Networks with implicit recurrent structures often become actual recurrent networks.
Hybrid networks outperform in algorithmic tasks in training speed and extrapolation.
Abstract
We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates yields a flexible hybridization of traditional CNNs and recurrent networks. Compared to traditional CNNs, we demonstrate substantial parameter savings on standard image classification tasks, while maintaining accuracy. Our simple parameter sharing scheme, though defined via soft weights, in practice often yields trained networks with near strict recurrent structure; with negligible side effects, they convert into networks with actual loops. Training these networks thus implicitly involves discovery of suitable recurrent architectures. Though considering only the design aspect of recurrent links, our trained networks achieve accuracy competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory
