Train Like a (Var)Pro: Efficient Training of Neural Networks with   Variable Projection

Elizabeth Newman; Lars Ruthotto; Joseph Hart; Bart van Bloemen; Waanders

arXiv:2007.13171·cs.LG·April 21, 2021

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

Elizabeth Newman, Lars Ruthotto, Joseph Hart, Bart van Bloemen, Waanders

PDF

1 Repo

TL;DR

This paper introduces GNvpro, an extension of variable projection for neural network training, which improves efficiency and generalization over traditional stochastic gradient descent methods in various applications.

Contribution

The paper develops the Gauss-Newton VarPro method for training DNNs with non-quadratic loss functions, broadening the applicability of variable projection techniques.

Findings

01

GNvpro outperforms SGD in efficiency across multiple tasks.

02

GNvpro achieves better generalization to unseen data.

03

Applicable to DNNs with affine last layers in classification tasks.

Abstract

Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elizabethnewman/slimTrain
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent