The duality structure gradient descent algorithm: analysis and applications to neural networks
Thomas Flynn

TL;DR
This paper introduces the duality structure gradient descent (DSGD) algorithm, enabling non-asymptotic analysis of neural network training by selecting layers to update greedily, with empirical validation across various scenarios.
Contribution
The paper proposes DSGD, a layer-wise coordinate descent algorithm suitable for neural networks, with theoretical convergence guarantees under mild assumptions.
Findings
DSGD converges to approximate stationary points in neural network training.
The algorithm performs well in both deterministic and stochastic settings.
Empirical results demonstrate effective training behavior across different neural network architectures.
Abstract
The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness assumption (formally, Lipschitz continuity of the gradient) that is too strong to be applicable in the case of deep neural networks. To address this, we propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis, under mild assumptions on the training set and network architecture. The algorithm can be viewed as a form of layer-wise coordinate descent, where at each iteration the algorithm chooses one layer of the network to update. The decision of what layer to update is done in a greedy fashion, based on a rigorous lower bound on the improvement of the objective function for each choice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
