slimTrain -- A Stochastic Approximation Method for Training Separable   Deep Neural Networks

Elizabeth Newman; Julianne Chung; Matthias Chung; Lars Ruthotto

arXiv:2109.14002·cs.LG·September 30, 2021

slimTrain -- A Stochastic Approximation Method for Training Separable Deep Neural Networks

Elizabeth Newman, Julianne Chung, Matthias Chung, Lars Ruthotto

PDF

Open Access 1 Repo

TL;DR

slimTrain is a novel stochastic optimization method tailored for separable deep neural networks, reducing hyperparameter sensitivity and enabling faster, more reliable training especially for complex scientific datasets.

Contribution

The paper introduces slimTrain, a new training algorithm that exploits DNN architecture separability to improve training efficiency and robustness without extensive hyperparameter tuning.

Findings

01

Outperforms existing methods with default hyperparameters

02

Reduces sensitivity to hyperparameter choices

03

Achieves faster initial convergence in experiments

Abstract

Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging in general. DNN training is commonly phrased as a stochastic optimization problem whose challenges include non-convexity, non-smoothness, insufficient regularization, and complicated data distributions. Hence, the performance of DNNs on a given task depends crucially on tuning hyperparameters, especially learning rates and regularization parameters. In the absence of theoretical guidelines or prior experience on similar tasks, this requires solving many training problems, which can be time-consuming and demanding on computational resources. This can limit the applicability of DNNs to problems with non-standard, complex, and scarce datasets, e.g., those arising in many scientific applications. To remedy the challenges of DNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xtractopen/meganet.m
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Gaussian Processes and Bayesian Inference