Diffusion-Based Neural Network Weights Generation
Bedionita Soro, Bruno Andreis, Hayeon Lee, Wonyong Jeong, Song Chong,, Frank Hutter, Sung Ju Hwang

TL;DR
This paper introduces D2NWG, a diffusion-based method for generating neural network weights conditioned on target datasets, improving transfer learning efficiency and performance across various models, including large language models.
Contribution
The paper presents a novel diffusion-based approach for neural network weight generation that generalizes across tasks and models, surpassing existing meta-learning and pretrained methods.
Findings
Outperforms state-of-the-art meta-learning methods.
Scalable to large architectures like LLMs.
Enhances performance of diverse base models.
Abstract
Transfer learning has gained significant attention in recent deep learning research due to its ability to accelerate convergence and enhance performance on new tasks. However, its success is often contingent on the similarity between source and target data, and training on numerous datasets can be costly, leading to blind selection of pretrained models with limited insight into their effectiveness. To address these challenges, we introduce D2NWG, a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning, conditioned on the target dataset. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation, learning the weight distributions of models pretrained on various datasets. This allows for automatic generation of weights that generalize…
Peer Reviews
Decision·ICLR 2025 Poster
- The proposed approach is conceptually simpler to train and test than previous meta-learning approaches which often involve solving a bi-level optimization problem. - The dataset conditioning in the approach increases the generality of the approach to new unseen datasets. - Extensive sets of experiments are presented for different settings such as few-shot learning, zero-shot learning, model retrieval, classifier head adaption, LoRA weight generation, and adapting LLM weights for specific tasks
- Although the authors argue that previous approaches rely on diversity of training architectures and dataset for generalization, the proposed approach also has limitations along the same axis: it may only generalize to new datasets which are in-distribution for the dataset encoder otherwise the dataset encodings can be noisy. For instance, if the dataset encoder is only trained on imagenet-like natural images, it may not generalize to unseen dataset distributions like medical images. Table 17 s
1. The paper investigates an interesting problem, the methodology is promising, and the paper is well-written and easy to read. 2. The experimental results are very solid, covering diverse tasks across a variety of datasets. These results outperform state-of-the-art meta-learning methods and pre-trained models.
1. Although the idea is interesting and the methodology looks promising, my main concern is about the practical significance of the proposed method, considering that training an effective diffusion model requires substantial computational resources. I will expand on this concern below: 2. Tables 1 and 2 show that D2NWG outperforms several baselines. Notably, D2NWG is tagged with 'CH', indicating that it only modifies the last classifier layer, similar to linear probing. This suggests that it may
1. The authors considered a wide range of evaluation settings (zero/few-shot learning, model retrieval, fine-tuning, domain transfer, conditioned weight generation, etc.) and provided comprehensive experiments. 2. The authors considered multiple ways to vectorized the model weights and encode the datasets.
1. **High Workload and Tuning Challenges Due to Additional VAE Training**: The additional VAE for parameter encoding introduces significant computational workload and requires hyperparameter tuning, making it less scalable and potentially costly. What steps could be taken to simplify this process, and are there alternative methods for parameter encoding that maintain performance but reduce overhead? 2. **Ambiguity in Dataset Encoding Choices in Section 3.3**: The paper mentions several methods
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Control Systems · Industrial Technology and Control Systems · Advanced Algorithms and Applications
MethodsBalanced Selection · Sparse Evolutionary Training · Diffusion · Latent Diffusion Model
