Generative Modeling of Weights: Generalization or Memorization?
Boya Zeng, Yida Yin, Zhiqiu Xu, Zhuang Liu

TL;DR
This paper critically evaluates generative models for neural network weights, revealing they mainly memorize training data rather than generate novel, high-performing weights, and highlights the need for better design and evaluation methods.
Contribution
It provides an empirical analysis showing current generative weight models tend to memorize rather than generate new weights, challenging prior claims of their novelty.
Findings
Generative models mostly memorize training weights rather than produce novel ones.
Simple baselines like weight noise or ensembling perform as well or better.
Limited data and overparameterization contribute to memorization issues.
Abstract
Generative models have recently been explored for synthesizing neural network weights. These approaches take neural network checkpoints as training data and aim to generate high-performing weights during inference. In this work, we examine four representative, well-known methods on their ability to generate novel model weights, i.e., weights that are different from the checkpoints seen during training. Contrary to claims in prior work, we find that these methods synthesize weights largely by memorization: they produce either replicas, or, at best, simple interpolations of the training checkpoints. Moreover, they fail to outperform simple baselines, such as adding noise to the weights or taking a simple weight ensemble, in obtaining different and simultaneously high-performing models. Our further analysis suggests that this memorization might result from limited data, overparameterized…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper proposes and empirically validates three hypotheses regarding memorization phenomena in the generative modeling of weights. - It provides clear quantitative indicators to measure whether a generative model reproduces training samples by memorization or by genuine generalization. - The analysis sheds light on the underlying mechanisms of model weight generation, offering insights for understanding and evaluating future generative approaches in parameter space.
- The noise-added baseline experiments are somewhat questionable, as recent findings do not sufficiently justify the smoothness of model weight space. Adding random noise in weight space may not correspond to meaningful perturbations in function space. - The related work section (2.2) lacks a comprehensive discussion of literature on symmetry and invariance in model weight space. Since the paper focuses on analyzing the geometry and structure of weight distributions rather than proposing a new m
1) The paper poses an important and interesting question regarding the capabilities of modern generative models, specifically in the domain of weight generation. 2) The paper is convincing in showing that generated samples of weights are very similar to those found in the training set. 3) The paper is clearly written and easy to follow.
My main concern revolves around defining and measuring novelty, the key issue which this paper focuses on. The only definition I found in the paper is in the abstract - "weights that are different from the checkpoints seen during training", seems not to fall in line with other details in the paper (more on that under points (2)). I think that the paper would extremely benefit from a clear definition of novelty, and a somber look at if this should be expected from these weight generation models,
* This work is well written and easy to follow * This is a very useful area of research, if we are able to produce the posterior distribution over neural network weights given some task or data, then many multi-agent or ensembling techniques will benefit. SO a critical study of what these approaches are really learning is important. * The tests regarding generalization and memorization are sensible. Any method that purports to learn to generate weights should "pass" these tests.
* The conclusions are underwhelming. I would expect something (even general) prescription of what can be done to mitigate the issues here. From the appendix, it seems clear that the identified levers e.g. training data, model capacity, do not actually lay a role in improving the behavior of these methods. * There is a mismatch in the amount of evidence presented for this study for each claim. Lots of space was dedicated to showing that the output weights are quite close to the training data. H
1. Originality: The paper addresses a critical, fundamental question that challenges the prevailing narrative in the emerging area of generative modeling for neural network weights: whether high task performance actually indicates generalization or merely sophisticated data reproduction. This study is highly original as it moves beyond standard quality metrics (like downstream accuracy) to introduce and apply rigorous novelty and memorization metrics suitable for weight space data (e.g., compari
I think the key weaknesses lie in the incomplete mechanistic understanding of why certain mitigations fail and the generalization constraints imposed by the training data sources themselves. 1. Inconsistent Results on Data Scaling: While the paper successfully demonstrates that scaling up the training data reduces memorization for G.pt (Table 2, Figure 9), it notes that scaling data does not mitigate memorization in Hyper-Representations (Table 7). This inconsistency, highlighted as potentially
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Games
MethodsDiffusion
