TL;DR
This paper introduces GNvpro, an extension of variable projection for neural network training, which improves efficiency and generalization over traditional stochastic gradient descent methods in various applications.
Contribution
The paper develops the Gauss-Newton VarPro method for training DNNs with non-quadratic loss functions, broadening the applicability of variable projection techniques.
Findings
GNvpro outperforms SGD in efficiency across multiple tasks.
GNvpro achieves better generalization to unseen data.
Applicable to DNNs with affine last layers in classification tasks.
Abstract
Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
