The duality structure gradient descent algorithm: analysis and   applications to neural networks

Thomas Flynn

arXiv:1708.00523·cs.LG·June 18, 2024

The duality structure gradient descent algorithm: analysis and applications to neural networks

Thomas Flynn

PDF

Open Access

TL;DR

This paper introduces the duality structure gradient descent (DSGD) algorithm, enabling non-asymptotic analysis of neural network training by selecting layers to update greedily, with empirical validation across various scenarios.

Contribution

The paper proposes DSGD, a layer-wise coordinate descent algorithm suitable for neural networks, with theoretical convergence guarantees under mild assumptions.

Findings

01

DSGD converges to approximate stationary points in neural network training.

02

The algorithm performs well in both deterministic and stochastic settings.

03

Empirical results demonstrate effective training behavior across different neural network architectures.

Abstract

The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness assumption (formally, Lipschitz continuity of the gradient) that is too strong to be applicable in the case of deep neural networks. To address this, we propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis, under mild assumptions on the training set and network architecture. The algorithm can be viewed as a form of layer-wise coordinate descent, where at each iteration the algorithm chooses one layer of the network to update. The decision of what layer to update is done in a greedy fashion, based on a rigorous lower bound on the improvement of the objective function for each choice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications