An Effective Training Method For Deep Convolutional Neural Network
Yang Jiang, Zeyang Dou, Qun Hao, Jie Cao, Kun Gao, Xi Chen

TL;DR
This paper introduces a nonlinearity generation method for deep CNNs that accelerates training, stabilizes convergence, and enables training of very deep models by modifying activation functions to act as regularizers.
Contribution
The paper proposes a novel nonlinearity generation technique that improves training speed and stability of deep CNNs by dynamically adjusting activation functions during training.
Findings
Speeds up convergence of deep CNN training
Reduces sensitivity to weight initialization
Enables training of very deep models with minimal extra cost
Abstract
In this paper, we propose the nonlinearity generation method to speed up and stabilize the training of deep convolutional neural networks. The proposed method modifies a family of activation functions as nonlinearity generators (NGs). NGs make the activation functions linear symmetric for their inputs to lower model capacity, and automatically introduce nonlinearity to enhance the capacity of the model during training. The proposed method can be considered an unusual form of regularization: the model parameters are obtained by training a relatively low-capacity model, that is relatively easy to optimize at the beginning, with only a few iterations, and these parameters are reused for the initialization of a higher-capacity model. We derive the upper and lower bounds of variance of the weight variation, and show that the initial symmetric structure of NGs helps stabilize training. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
