Singular Value Perturbation and Deep Network Optimization
Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk

TL;DR
This paper provides a theoretical analysis of how matrix perturbation affects deep network optimization, explaining why architectures with skip connections like ResNets and DenseNets are easier to optimize than traditional ConvNets.
Contribution
It introduces new perturbation results for singular values in deep networks and analytically explains the optimization advantages of skip connection architectures.
Findings
Skip connections lead to more stable singular values.
Networks with skip connections have less erratic loss surfaces.
Activation functions influence singular value behavior independently of architecture.
Abstract
We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convolutional networks, ConvNets). Building on our earlier work connecting deep networks with continuous piecewise-affine splines, we develop an exact local linear representation of a deep network layer for a family of modern deep networks that includes ConvNets at one end of a spectrum and ResNets, DenseNets, and other networks with skip connections at the other. For regression and classification tasks that optimize the squared-error loss, we show that the optimization loss surface of a modern deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsResidual Connection · Residual Block · Bottleneck Residual Block · Kaiming Initialization · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · 1x1 Convolution · Convolution · Dense Connections · Max Pooling
