Simultaneous Weight and Architecture Optimization for Neural Networks
Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava

TL;DR
This paper presents a novel training framework that learns neural network architecture and parameters simultaneously through gradient descent, enabling the discovery of sparse, compact, and high-performing networks without traditional NAS steps.
Contribution
It introduces a multi-scale encoder-decoder architecture that embeds neural networks for joint optimization of structure and weights via gradient descent.
Findings
Successfully discovers sparse, compact neural networks.
Maintains high performance comparable to traditional methods.
Demonstrates effectiveness across datasets.
Abstract
Neural networks are trained by choosing an architecture and training the parameters. The choice of architecture is often by trial and error or with Neural Architecture Search (NAS) methods. While NAS provides some automation, it often relies on discrete steps that optimize the architecture and then train the parameters. We introduce a novel neural network training framework that fundamentally transforms the process by learning architecture and parameters simultaneously with gradient descent. With the appropriate setting of the loss function, it can discover sparse and compact neural networks for given datasets. Central to our approach is a multi-scale encoder-decoder, in which the encoder embeds pairs of neural networks with similar functionalities close to each other (irrespective of their architectures and weights). To train a neural network with a given dataset, we randomly sample a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
