Growing Neural Networks: Dynamic Evolution through Gradient Descent

Anil Radhakrishnan; John F. Lindner; Scott T. Miller; Sudeshna Sinha; William L. Ditto

arXiv:2501.18012·cs.LG·July 29, 2025

Growing Neural Networks: Dynamic Evolution through Gradient Descent

Anil Radhakrishnan, John F. Lindner, Scott T. Miller, Sudeshna Sinha, William L. Ditto

PDF

Open Access 1 Repo

TL;DR

This paper introduces two gradient-based methods for evolving neural networks during training, allowing networks to grow dynamically and outperform static counterparts in regression and classification tasks.

Contribution

It presents novel approaches for neural network growth during training using auxiliary weights and masks, optimizing size via gradient descent.

Findings

01

Growing networks outperform static ones of similar size.

02

Starting small and growing can be more effective than starting large.

03

Scaling relations between growing and static networks are identified.

Abstract

In contrast to conventional artificial neural networks, which are structurally static, we present two approaches for evolving small networks into larger ones during training. The first method employs an auxiliary weight that directly controls network size, while the second uses a controller-generated mask to modulate neuron participation. Both approaches optimize network size through the same gradient-descent algorithm that updates the network's weights and biases. We evaluate these growing networks on nonlinear regression and classification tasks, where they consistently outperform static networks of equivalent final size. We then explore the hyperparameter space of these networks to find associated scaling relations relative to their static counterparts. Our results suggest that starting small and growing naturally may be preferable to simply starting large, particularly as neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NonlinearArtificialIntelligenceLab/N3
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsPruning