Weak-to-Strong Diffusion with Reflection

Lichen Bai; Masashi Sugiyama; Zeke Xie

arXiv:2502.00473·cs.LG·April 25, 2025

Weak-to-Strong Diffusion with Reflection

Lichen Bai, Masashi Sugiyama, Zeke Xie

PDF

Open Access 3 Reviews

TL;DR

Weak-to-Strong Diffusion (W2SD) is a novel framework that leverages differences between weak and strong models to improve diffusion generative models, enhancing quality and alignment with real data across multiple modalities.

Contribution

W2SD introduces a reflection-based method utilizing weak-to-strong model differences to guide diffusion models toward real data distribution, with broad applicability and significant performance gains.

Findings

01

Achieves state-of-the-art results across image and video tasks.

02

Significantly improves human preference and aesthetic quality.

03

Outperforms original models with minimal additional computational cost.

Abstract

The goal of diffusion generative models is to align the learned distribution with the real data distribution through gradient score matching. However, inherent limitations in training data quality, modeling strategies, and architectural design lead to inevitable gap between generated outputs and real data. To reduce this gap, we propose Weak-to-Strong Diffusion (W2SD), a novel framework that utilizes the estimated difference between existing weak and strong models (i.e., weak-to-strong difference) to bridge the gap between an ideal model and a strong model. By employing a reflective operation that alternates between denoising and inversion with weak-to-strong difference, we theoretically understand that W2SD steers latent variables along sampling trajectories toward regions of the real data distribution. W2SD is highly flexible and broadly applicable, enabling diverse improvements…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

**(S1)**: The approach is simple and training-free, and can flexibly be used with various different models. **(S2)**: Experimental results explore a variety of different diffusion models and architectures. **(S3)**: A discussion on compute-aware sampling is provided, with considerations for sampling quality within a fixed wall-clock budget. This is relevant for broader applicability.

Weaknesses

**(W1)**: The proof of Theorem 1 is weak. There is no solid justification given to "neglect the approximation error" (L727). The proof ignores Jacobian terms or any formulation for the approximation error. No conditions are provided for when $\Delta_1 \approx \Delta_2$. Omitting these conditions is a significant weakness. This makes the justification more heuristic than theoretical. **(W2)**: The authors note that when the gap magnitude is negative (i.e. strong model weaker than the weak model)

Reviewer 02Rating 6Confidence 4

Strengths

1. Simple, Effective, and General Idea: The core concept of using the difference between a weak and a strong model to approximate the direction of improvement is both intuitive and powerful. The framework's ability to operate on various types of "gaps" (weight, condition, etc.) demonstrates its impressive versatility. 2. Strong and Comprehensive Empirical Results: The paper provides extensive evidence of W2SD's effectiveness across multiple models (SD1.5, SDXL, DiT), tasks (image, video), and m

Weaknesses

1. Idealized Assumption of the "Weak-to-Strong" Gap: The framework's core assumption is that the weak-to-strong gap vector is a reliable proxy for the strong-to-ideal gap. While this holds in the controlled 1D/2D experiments where models differ mainly in data bias, it may be too idealized for real-world scenarios. In practice, a strong and weak model may have qualitatively different failure modes or "worldviews" due to differences in architecture or fine-tuning data. In such cases, their differe

Reviewer 03Rating 4Confidence 4

Strengths

- The unified W2SD is general and applicable to different diffusion models. - Strong results covering different regions of interest.

Weaknesses

Overall, the presentation of this paper is a problem. There are many figures and tables in both the main paper and the supplementary. However, the authors failed to organize them well in the text, making the paper slightly hard to read. A few concrete comments include: - Figure 2 is hard to understand. No further explanations are associated with it. - Algorithm 1 did not show anything specific about the W2SD algorithm, but a widely used procedure for inference-based optimizations. - Eq 5 and

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Advanced Mathematical Modeling in Engineering