Nonlinear Initialization Methods for Low-Rank Neural Networks
Kiran Vodrahalli, Rakesh Shivanna, Maheswaran Sathiamoorthy and, Sagar Jain, Ed H. Chi

TL;DR
This paper introduces a new low-rank initialization method for deep neural networks that focuses on function approximation rather than parameter approximation, providing theoretical insights and practical algorithms validated on ImageNet.
Contribution
It presents a novel low-rank initialization framework, proves the computational tractability of low-rank ReLU approximation, and demonstrates improved training performance on large-scale models.
Findings
Significant gap between parameter and function approximation for low-rank methods.
First provably efficient algorithm for ReLU low-rank approximation.
Validated approach on ResNet and EfficientNet models on ImageNet.
Abstract
We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initialization parameters in the Frobenius norm with a pair of low-rank initialization matrices via singular value decomposition. Our method is inspired by the insight that approximating the function corresponding to each layer is more important than approximating the parameter values. We provably demonstrate that there is a significant gap between these two approaches for ReLU networks, particularly as the desired rank of the approximating weights decreases, or as the dimension of the inputs to the layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Medical Image Segmentation Techniques · Statistical and numerical algorithms
MethodsDepthwise Convolution · Pointwise Convolution · Average Pooling · RMSProp · Dense Connections · Depthwise Separable Convolution · Squeeze-and-Excitation Block · Residual Connection · Batch Normalization · Inverted Residual Block
