Layer-Specific Adaptive Learning Rates for Deep Networks

Bharat Singh; Soham De; Yangmuzi Zhang; Thomas Goldstein; and Gavin; Taylor

arXiv:1510.04609·cs.CV·October 16, 2015

Layer-Specific Adaptive Learning Rates for Deep Networks

Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Goldstein, and Gavin, Taylor

PDF

TL;DR

This paper introduces a layer-specific, adaptive learning rate method for deep neural networks that accelerates training and improves accuracy by addressing vanishing gradients and saddle points.

Contribution

It proposes a novel optimization technique that adjusts learning rates per layer based on curvature, enhancing training speed and performance in deep networks.

Findings

01

Faster convergence on MNIST, CIFAR10, and ImageNet datasets.

02

Improved accuracy over standard algorithms.

03

Reduced training time for deep networks.

Abstract

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely large for weights connecting deep layers (layers near the output layer), and extremely small for shallow layers (near the input layer); this results in slow learning in the shallow layers. Additionally, it has also been shown that in highly non-convex problems, such as deep neural networks, there is a proliferation of high-error low curvature saddle points, which slows down learning dramatically. In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings