A disciplined approach to neural network hyper-parameters: Part 1 --   learning rate, batch size, momentum, and weight decay

Leslie N. Smith

arXiv:1803.09820·cs.LG·April 25, 2018·828 cites

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Leslie N. Smith

PDF

Open Access 5 Repos

TL;DR

This paper presents practical methods for tuning key neural network hyper-parameters like learning rate, batch size, momentum, and weight decay to reduce training time and improve model performance.

Contribution

It introduces guidelines for analyzing training loss and adjusting hyper-parameters systematically, emphasizing the importance of balancing regularization with other parameters.

Findings

01

Proper hyper-parameter tuning reduces training time.

02

Balancing regularization improves model performance.

03

Optimal weight decay is linked to learning rate and momentum.

Abstract

Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting the hyper-parameters remains a black art that requires years of experience to acquire. This report proposes several efficient ways to set the hyper-parameters that significantly reduce training time and improves performance. Specifically, this report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point. Then it discusses how to increase/decrease the learning rate/momentum to speed up training. Our experiments show that it is crucial to balance every manner of regularization for each dataset and architecture. Weight decay is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · 1cycle learning rate scheduling policy · Weight Decay