A Multilevel Approach to Training

Vanessa Braglia; Alena Kopani\v{c}\'akov\'a; Rolf Krause

arXiv:2006.15602·cs.LG·June 30, 2020

A Multilevel Approach to Training

Vanessa Braglia, Alena Kopani\v{c}\'akov\'a, Rolf Krause

PDF

Open Access

TL;DR

This paper introduces a multilevel training approach that uses surrogate models with fewer samples to improve training efficiency and gradient estimation accuracy in machine learning.

Contribution

It applies nonlinear multilevel minimization techniques to machine learning, creating surrogate models that reduce variance in gradient estimates and enhance training convergence.

Findings

01

Improved convergence in logistic regression tasks

02

Surrogate models reduce gradient variance

03

Outperforms subsampled Newton's and variance reduction methods

Abstract

We propose a novel training method based on nonlinear multilevel minimization techniques, commonly used for solving discretized large scale partial differential equations. Our multilevel training method constructs a multilevel hierarchy by reducing the number of samples. The training of the original model is then enhanced by internally training surrogate models constructed with fewer samples. We construct the surrogate models using first-order consistency approach. This gives rise to surrogate models, whose gradients are stochastic estimators of the full gradient, but with reduced variance compared to standard stochastic gradient estimators. We illustrate the convergence behavior of the proposed multilevel method to machine learning applications based on logistic regression. A comparison with subsampled Newton's and variance reduction methods demonstrate the efficiency of our multilevel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference