Neural Network Diffusion
Kai Wang, Dongwen Tang, Boya Zeng, Yida Yin, Zhaopan Xu, Yukun Zhou,, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You

TL;DR
This paper introduces a novel method using diffusion models and autoencoders to generate high-performing neural network parameters, demonstrating consistent improvements across architectures and datasets without memorization.
Contribution
It presents a simple approach combining autoencoders and diffusion models to synthesize neural network parameters, a new application of diffusion models in neural network generation.
Findings
Generated models match or outperform trained networks.
Models are not simply memorizing training data.
Method is effective across various architectures and datasets.
Abstract
Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a diffusion model. The autoencoder extracts latent representations of a subset of the trained neural network parameters. Next, a diffusion model is trained to synthesize these latent representations from random noise. This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters. Across various architectures and datasets, our approach consistently generates models with comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
1. The research topic is pretty interesting and novel. Instead of using SGD for neural networks learning, we can use a generative approach to generate the parameters as well. 2. Experiments are conducted on multiple classification datasets with promising results over SGD learning.
1. I am mainly questionable about the set of parameters that is being generated by the diffusion model for each network architecture. - Do the authors try to generate all the parameters of a model? The paper does not mention it. - What is the number of parameters that are being generated for each task in Table 1? The paper does not mention it. - For VIT architecture, I think many of them do not use batch normalization but instead doing group normalization, instance normalization, et
- The paper is tackling an interesting problem; neural network parameter generation is personally an interesting research topic. - The paper is fairly well-written and easy to follow. The algorithm is straightforward and the authors provided enough details, so I imagine it to be easily reproducible. - The paper provides extensive qualitative analysis and ablation study about the proposed method.
- The novelty is limited. The very ideas of "neural network diffusion" or "using diffusion model for neural network parameters" or even the concept of "treating neural network parameters as an object to define a generative model", are not new. If this paper had been the pioneering source for these ideas, its contribution would have been particularly commendable. The only difference from the previous works is that the diffusion is applied to the latent space, which is also a standard approach for
As far as I know, this is the first work that utilises a diffusion model for this specific task. The method is clearly explained in the paper. The writing is easy to follow.
The main weakness of this paper is the limited novelty of the proposed approach. IMO, this is just a direct application/adaption of stable diffusion for another application. In principle, there's nothing new about the proposed approach. Additionally, I found the presentation of the experimental results confusing. I will list the questions in the section below.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsDiffusion
