Inertial Proximal Deep Learning Alternating Minimization for Efficient   Neutral Network Training

Linbo Qiao; Tao Sun; Hengyue Pan; Dongsheng Li

arXiv:2102.00267·cs.LG·February 2, 2021

Inertial Proximal Deep Learning Alternating Minimization for Efficient Neutral Network Training

Linbo Qiao, Tao Sun, Hengyue Pan, Dongsheng Li

PDF

Open Access

TL;DR

This paper introduces an improved deep learning training algorithm, iPDLAM, combining inertial techniques and warm-up strategies to enhance efficiency over traditional methods like SGD.

Contribution

It develops an inertial proximal alternating minimization method with warm-up for faster neural network training, addressing limitations of existing algorithms.

Findings

01

Demonstrates faster convergence on real-world datasets

02

Shows improved training efficiency over standard methods

03

Validates effectiveness through numerical experiments

Abstract

In recent years, the Deep Learning Alternating Minimization (DLAM), which is actually the alternating minimization applied to the penalty form of the deep neutral networks training, has been developed as an alternative algorithm to overcome several drawbacks of Stochastic Gradient Descent (SGD) algorithms. This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates. To obtain further training speed, we apply a warm-up technique to the penalty parameter, that is, starting with a small initial one and increasing it in the iterations. Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning