Splitting Steepest Descent for Growing Neural Architectures

Qiang Liu; Lemeng Wu; Dilin Wang

arXiv:1910.02366·cs.LG·November 6, 2019·20 cites

Splitting Steepest Descent for Growing Neural Architectures

Qiang Liu, Lemeng Wu, Dilin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a progressive neural network growth method using a functional steepest descent approach, enabling adaptive splitting of neurons to improve training efficiency and escape saddle points.

Contribution

It proposes a novel neuron splitting criterion and gradient update based on second-order functional steepest descent, advancing neural architecture optimization.

Findings

01

Effective in escaping saddle points in training.

02

Enables resource-efficient neural architecture growth.

03

Provides a theoretical foundation for adaptive network splitting.

Abstract

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a splitting gradient for optimally updating the off-springs. Theoretically, our splitting strategy is a second-order functional steepest descent for escaping saddle points in an $\infty$ -Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

klightz/splitting
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis