Accelerated Gradient-free Neural Network Training by Multi-convex   Alternating Optimization

Junxiang Wang; Hongyi Li; and Liang Zhao

arXiv:1811.04187·math.OC·February 17, 2022·Neurocomputing·1 cites

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Junxiang Wang, Hongyi Li, and Liang Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel monotonic alternating minimization algorithm for neural network training that overcomes limitations of SGD, offering theoretical convergence guarantees and accelerated performance.

Contribution

The paper proposes the mDLAM algorithm with an inequality-constrained formulation, enabling convergence proofs independent of hyperparameters and achieving fast linear convergence with Nesterov acceleration.

Findings

01

Demonstrates convergence and efficiency on benchmark datasets

02

Achieves linear convergence rate with Nesterov acceleration

03

Outperforms traditional gradient-based methods in experiments

Abstract

In recent years, even though Stochastic Gradient Descent (SGD) and its variants are well-known for training neural networks, it suffers from limitations such as the lack of theoretical guarantees, vanishing gradients, and excessive sensitivity to input. To overcome these drawbacks, alternating minimization methods have attracted fast-increasing attention recently. As an emerging and open domain, however, several new challenges need to be addressed, including 1) Convergence properties are sensitive to penalty parameters, and 2) Slow theoretical convergence rate. We, therefore, propose a novel monotonous Deep Learning Alternating Minimization (mDLAM) algorithm to deal with these two challenges. Our innovative inequality-constrained formulation infinitely approximates the original problem with non-convex equality constraints, enabling our convergence proof of the proposed mDLAM algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xianggebenben/mdlam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications