Splitting Steepest Descent for Growing Neural Architectures
Qiang Liu, Lemeng Wu, Dilin Wang

TL;DR
This paper introduces a progressive neural network growth method using a functional steepest descent approach, enabling adaptive splitting of neurons to improve training efficiency and escape saddle points.
Contribution
It proposes a novel neuron splitting criterion and gradient update based on second-order functional steepest descent, advancing neural architecture optimization.
Findings
Effective in escaping saddle points in training.
Enables resource-efficient neural architecture growth.
Provides a theoretical foundation for adaptive network splitting.
Abstract
We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a splitting gradient for optimally updating the off-springs. Theoretically, our splitting strategy is a second-order functional steepest descent for escaping saddle points in an -Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
