Neural Network Diffusion

Kai Wang; Dongwen Tang; Boya Zeng; Yida Yin; Zhaopan Xu; Yukun Zhou,; Zelin Zang; Trevor Darrell; Zhuang Liu; and Yang You

arXiv:2402.13144·cs.LG·January 3, 2025·2 cites

Neural Network Diffusion

Kai Wang, Dongwen Tang, Boya Zeng, Yida Yin, Zhaopan Xu, Yukun Zhou,, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper introduces a novel method using diffusion models and autoencoders to generate high-performing neural network parameters, demonstrating consistent improvements across architectures and datasets without memorization.

Contribution

It presents a simple approach combining autoencoders and diffusion models to synthesize neural network parameters, a new application of diffusion models in neural network generation.

Findings

01

Generated models match or outperform trained networks.

02

Models are not simply memorizing training data.

03

Method is effective across various architectures and datasets.

Abstract

Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a diffusion model. The autoencoder extracts latent representations of a subset of the trained neural network parameters. Next, a diffusion model is trained to synthesize these latent representations from random noise. This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters. Across various architectures and datasets, our approach consistently generates models with comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The research topic is pretty interesting and novel. Instead of using SGD for neural networks learning, we can use a generative approach to generate the parameters as well. 2. Experiments are conducted on multiple classification datasets with promising results over SGD learning.

Weaknesses

1. I am mainly questionable about the set of parameters that is being generated by the diffusion model for each network architecture. - Do the authors try to generate all the parameters of a model? The paper does not mention it. - What is the number of parameters that are being generated for each task in Table 1? The paper does not mention it. - For VIT architecture, I think many of them do not use batch normalization but instead doing group normalization, instance normalization, et

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

- The paper is tackling an interesting problem; neural network parameter generation is personally an interesting research topic. - The paper is fairly well-written and easy to follow. The algorithm is straightforward and the authors provided enough details, so I imagine it to be easily reproducible. - The paper provides extensive qualitative analysis and ablation study about the proposed method.

Weaknesses

- The novelty is limited. The very ideas of "neural network diffusion" or "using diffusion model for neural network parameters" or even the concept of "treating neural network parameters as an object to define a generative model", are not new. If this paper had been the pioneering source for these ideas, its contribution would have been particularly commendable. The only difference from the previous works is that the diffusion is applied to the latent space, which is also a standard approach for

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

As far as I know, this is the first work that utilises a diffusion model for this specific task. The method is clearly explained in the paper. The writing is easy to follow.

Weaknesses

The main weakness of this paper is the limited novelty of the proposed approach. IMO, this is just a direct application/adaption of stable diffusion for another application. In principle, there's nothing new about the proposed approach. Additionally, I found the presentation of the experimental results confusing. I will list the questions in the section below.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsDiffusion