Multilevel Initialization for Layer-Parallel Deep Neural Network Training
Eric C. Cyr, Stefanie G\"unther, Jacob B. Schroder

TL;DR
This paper introduces a multilevel initialization approach for deep neural networks using layer-parallel multigrid methods, improving training efficiency and robustness by leveraging coarse-to-fine network refinements.
Contribution
It proposes a novel multilevel initialization strategy based on optimal control and multigrid concepts, enhancing deep network training scalability and stability.
Findings
Reduced training run time for equivalent accuracy
Improved initialization leading to better regularization effects
Decreased sensitivity to hyperparameters and initial conditions
Abstract
This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks enabled by scalable layer-parallel training. To do this, we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate the effectiveness of such multilevel "nested iteration" strategies for network training, showing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
