Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods
Jintao Xu, Chenglong Bao, Wenxun Xing

TL;DR
This paper presents a unified analysis framework for the convergence rates of alternating minimization methods in training deep neural networks, leveraging the Kurdyka-Lojasiewicz property to relax descent algorithm requirements.
Contribution
It introduces a novel convergence analysis framework for AM-type training methods of DNNs, accommodating various KL exponents and decrease conditions.
Findings
Convergence rates depend on the KL exponent $ heta$ in [0,1).
Local R-linear convergence is achieved under stronger decrease conditions.
The framework relaxes the need for designing specific descent algorithms.
Abstract
Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis is based on the non-monotone -step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent varies in . Moreover, the local R-linear convergence is discussed under a stronger -step sufficient decrease condition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
