An Effective Training Method For Deep Convolutional Neural Network

Yang Jiang; Zeyang Dou; Qun Hao; Jie Cao; Kun Gao; Xi Chen

arXiv:1708.01666·cs.LG·October 18, 2017

An Effective Training Method For Deep Convolutional Neural Network

Yang Jiang, Zeyang Dou, Qun Hao, Jie Cao, Kun Gao, Xi Chen

PDF

Open Access

TL;DR

This paper introduces a nonlinearity generation method for deep CNNs that accelerates training, stabilizes convergence, and enables training of very deep models by modifying activation functions to act as regularizers.

Contribution

The paper proposes a novel nonlinearity generation technique that improves training speed and stability of deep CNNs by dynamically adjusting activation functions during training.

Findings

01

Speeds up convergence of deep CNN training

02

Reduces sensitivity to weight initialization

03

Enables training of very deep models with minimal extra cost

Abstract

In this paper, we propose the nonlinearity generation method to speed up and stabilize the training of deep convolutional neural networks. The proposed method modifies a family of activation functions as nonlinearity generators (NGs). NGs make the activation functions linear symmetric for their inputs to lower model capacity, and automatically introduce nonlinearity to enhance the capacity of the model during training. The proposed method can be considered an unusual form of regularization: the model parameters are obtained by training a relatively low-capacity model, that is relatively easy to optimize at the beginning, with only a few iterations, and these parameters are reused for the initialization of a higher-capacity model. We derive the upper and lower bounds of variance of the weight variation, and show that the initial symmetric structure of NGs helps stabilize training. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings