Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization
Junxiang Wang, Hongyi Li, and Liang Zhao

TL;DR
This paper introduces a novel monotonic alternating minimization algorithm for neural network training that overcomes limitations of SGD, offering theoretical convergence guarantees and accelerated performance.
Contribution
The paper proposes the mDLAM algorithm with an inequality-constrained formulation, enabling convergence proofs independent of hyperparameters and achieving fast linear convergence with Nesterov acceleration.
Findings
Demonstrates convergence and efficiency on benchmark datasets
Achieves linear convergence rate with Nesterov acceleration
Outperforms traditional gradient-based methods in experiments
Abstract
In recent years, even though Stochastic Gradient Descent (SGD) and its variants are well-known for training neural networks, it suffers from limitations such as the lack of theoretical guarantees, vanishing gradients, and excessive sensitivity to input. To overcome these drawbacks, alternating minimization methods have attracted fast-increasing attention recently. As an emerging and open domain, however, several new challenges need to be addressed, including 1) Convergence properties are sensitive to penalty parameters, and 2) Slow theoretical convergence rate. We, therefore, propose a novel monotonous Deep Learning Alternating Minimization (mDLAM) algorithm to deal with these two challenges. Our innovative inequality-constrained formulation infinitely approximates the original problem with non-convex equality constraints, enabling our convergence proof of the proposed mDLAM algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
