Understand the Effectiveness of Shortcuts through the Lens of DCA
Youran Sun, Yihua Liu, Yi-Shuai Niu

TL;DR
This paper explores how the Difference-of-Convex Algorithm (DCA) framework helps explain the effectiveness of shortcut connections in neural networks and introduces a new architecture called NegNet.
Contribution
It demonstrates that shortcut neural network gradients can be derived via DCA and proposes NegNet, a novel architecture comparable to ResNet within the DCA framework.
Findings
Shortcut gradients can be obtained through DCA applied to vanilla networks.
NegNet performs on par with ResNet despite differing in structure.
DCA provides a theoretical lens to understand shortcut effectiveness.
Abstract
Difference-of-Convex Algorithm (DCA) is a well-known nonconvex optimization algorithm for minimizing a nonconvex function that can be expressed as the difference of two convex ones. Many famous existing optimization algorithms, such as SGD and proximal point methods, can be viewed as special DCAs with specific DC decompositions, making it a powerful framework for optimization. On the other hand, shortcuts are a key architectural feature in modern deep neural networks, facilitating both training and optimization. We showed that the shortcut neural network gradient can be obtained by applying DCA to vanilla neural networks, networks without shortcut connections. Therefore, from the perspective of DCA, we can better understand the effectiveness of networks with shortcuts. Moreover, we proposed a new architecture called NegNet that does not fit the previous interpretation but performs on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · Kaiming Initialization · Global Average Pooling · Stochastic Gradient Descent · Max Pooling · Convolution
