TL;DR
This paper introduces Score Anisotropy Directions (SADs), a novel architecture-dependent framework that reveals how score-based generative models encode data structure and predict their generalization ability before training.
Contribution
We propose SADs as a new method to analyze and predict the inductive biases and performance of score-based generative models based on network architecture.
Findings
SADs form adaptive bases aligned with network geometry.
SADs reliably predict model behavior on synthetic and real data.
SADs correlate with downstream performance metrics.
Abstract
We investigate the role of network architecture in shaping the inductive biases of modern score-based generative models. To this end, we introduce the Score Anisotropy Directions (SADs), architecture-dependent directions that reveal how different networks preferentially capture data structure. Our analysis shows that SADs form adaptive bases aligned with the architecture's output geometry, providing a principled way to predict generalization ability in score models prior to training. Through both synthetic data and standard image benchmarks, we demonstrate that SADs reliably capture fine-grained model behavior and correlate with downstream performance, as measured by Wasserstein metrics. Our work offers a new lens for explaining and predicting directional biases of generative models.
Peer Reviews
Decision·Submitted to ICLR 2026
1. **Clarity**: This work is clearly presented with informative visualization of preliminary results on iDDPM which enhances the motivations. The logic is generally linear and the whole paper is easy to follow. Theories are well articulated and key takeaways are summarized with clarity at the end of each block of analysis and discoveries. 2. **Quality**: The proposed theories are proved comprehensively and coherently on the importance of directional preference in network architectures and the i
1. Although the claimed discoveries are supported by experiments, datasets used in experiments are in low resolutions where the highest is only $56\times 56$. Experiments on higher dimensionality may indicate results on the contrary to the theoretical discoveries. It would be more comprehensive to conduct experiments on datasets such as ImageNet with the resolution of $256\times 256$. 2. Before this work, there is already existing research discussing and modifying generative models on direction
This work presents a new lens for studying the performance of score-based generative models. It even offers some novel insights, such as a training recipe that applies appropriate rigid motions to align the data with the SADs, and the bold claim that subspaces with small eigenvalues are easier for the models to learn. The central topic itself is a timely one, as the impact of generative models is rapidly growing, while we still have little understanding on how they work. I can see this work se
As much as I like the idea and the approach, I also have some questions and concerns about the current state of the paper. The writing, especially in the introduction, has room for improvement. For example, while the introduction claims to provide a precise notion of "geometry" (around line numbers 46--48), the very first definition of SADs (Definition 1) is somewhat vague. How is a "preference" measured, and what does it mean to "generate data *along* a direction"? I would also suggest movin
Overall, I think this is a good empirical paper that proposes an interesting phenomenon for score-based generative models. I especially appreciate the clarity around SADs, how to quantify the "preferred directions" in generative models, and the demonstration that lower data–geometry alignment leads to better generation quality. The result is simple to grasp through the manifold hypothesis of data distribution, yet seems to broadly applicable across architectures and optimization algorithms.
The only weakness I can think of is that, while extensive experiments are conducted to verify the main conjecture, it is rigorously proved only for a linear DSM toy model. The paper would be stronger with an intuitive argument in a wide‑network/NTK or mean‑field limit showing why the SAD eigen‑ordering should persist and clarifying conditions under which their conjecture holds true.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
