slimTrain -- A Stochastic Approximation Method for Training Separable Deep Neural Networks
Elizabeth Newman, Julianne Chung, Matthias Chung, Lars Ruthotto

TL;DR
slimTrain is a novel stochastic optimization method tailored for separable deep neural networks, reducing hyperparameter sensitivity and enabling faster, more reliable training especially for complex scientific datasets.
Contribution
The paper introduces slimTrain, a new training algorithm that exploits DNN architecture separability to improve training efficiency and robustness without extensive hyperparameter tuning.
Findings
Outperforms existing methods with default hyperparameters
Reduces sensitivity to hyperparameter choices
Achieves faster initial convergence in experiments
Abstract
Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging in general. DNN training is commonly phrased as a stochastic optimization problem whose challenges include non-convexity, non-smoothness, insufficient regularization, and complicated data distributions. Hence, the performance of DNNs on a given task depends crucially on tuning hyperparameters, especially learning rates and regularization parameters. In the absence of theoretical guidelines or prior experience on similar tasks, this requires solving many training problems, which can be time-consuming and demanding on computational resources. This can limit the applicability of DNNs to problems with non-standard, complex, and scarce datasets, e.g., those arising in many scientific applications. To remedy the challenges of DNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Gaussian Processes and Bayesian Inference
