Neural Network Training Techniques Regularize Optimization Trajectory:   An Empirical Study

Cheng Chen; Junjie Yang; Yi Zhou

arXiv:2011.06702·cs.LG·March 5, 2024·1 cites

Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study

Cheng Chen, Junjie Yang, Yi Zhou

PDF

Open Access

TL;DR

This paper empirically investigates how various training techniques in deep neural networks regularize the optimization trajectory, leading to faster convergence and better alignment of model updates with the trajectory.

Contribution

It introduces a regularity principle explaining the effect of training techniques on DNN optimization and provides theoretical and empirical evidence of its impact on convergence.

Findings

01

Training techniques improve convergence speed.

02

Successful trainings align model updates with the trajectory.

03

Regularity parameter correlates with training effectiveness.

Abstract

Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc. Despite their effectiveness, it is still mysterious how they help accelerate DNN trainings in practice. In this paper, we provide an empirical study of the regularization effect of these training techniques on DNN optimization. Specifically, we find that the optimization trajectories of successful DNN trainings consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction. Theoretically, we show that such a regularity principle leads to a convergence guarantee in nonconvex optimization and the convergence rate depends on a regularization parameter. Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM