GUD: Generation with Unified Diffusion
Mathis Gerdes, Max Welling, Miranda C. N. Cheng

TL;DR
This paper introduces GUD, a unified diffusion framework inspired by physics concepts, allowing flexible design choices in representation, prior, and noise scheduling to improve generative modeling.
Contribution
It presents a novel unified framework for diffusion models that integrates various representations, priors, and schedules, enabling smoother interpolation between diffusion and autoregressive models.
Findings
Enhanced design flexibility for diffusion models.
Bridging diffusion and autoregressive approaches.
Potential for more efficient training and generation.
Abstract
Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates (e.g. pixel-, PCA-, Fourier-, or wavelet-basis), 2) the prior distribution that data is transformed into during diffusion (e.g. Gaussian with covariance ), and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom. In particular, we introduce soft-conditioning models that smoothly…
Peer Reviews
Decision·Submitted to ICLR 2025
The paper is well explained. The authors bring attention to the flexibility in the diffusion model paradigm, though as discussed below this has been discussed in many prior papers. The authors introduce what I believe to be a novel interpretation and use case for time-varying diffusion scale timers, leading to an autoregressive type forward process, applying noise to separate components independently. A similar procedure was used for diffusion in frequency space by applying different diffusio
## Weakness 1 While the authors attempt to unify the design of dynamics for references; two of the three ideas proposed are not novel so it is unclear what the main contributions of the paper are. 1) Using a change of basis Applying diffusion in a transformed space / change of basis has been done before. Although [1] focuses on change of basis to frequency basis, section 4.1 of [1] explicitly explains how any other change of basis can be performed. I do not see any compelling evidence to sug
The Generative Unified Diffusion (GUD) model provides a novel unification of diffusion and autoregressive generative approaches, allowing a flexible transition between simultaneous and sequential generation processes. This ability to bridge methods expands the framework’s application to a broad spectrum of tasks, from inpainting and sequential data extension to standard generative modeling. By creating a model that can interpolate between different generative styles, GUD allows developers to tai
The GUD framework is flexible, and consequently introduces significant computational complexity. Each configuration, such as basis choice (PCA, Fourier, wavelet) and component-wise noise scheduling, requires tuning, making the model resource-intensive. This complexity can hinder scalability, especially in high-dimensional data applications where each choice impacts the computational load. Architecturally, GUD’s design adds complexity by requiring modifications like cross-attention mechanisms fo
1. The paper addresses limitations in standard diffusion models by proposing an interesting and innovative Generative Unified Diffusion (GUD) model. 2. The theoretical foundation of the paper is solid, and the presentation is clear. 3. The analyses and designs within the GUD framework are novel and potentially valuable across multiple applications.
1. **Limited empirical evaluation:** The experiments primarily serve to validate the proposed designs (pixel/PCA/FFT). While these results offer some insights, the evaluation lacks depth, particularly in quantifying each design’s impact on GUD's performance. More comprehensive quantitative and qualitative results would better demonstrate the effectiveness of each design. 2. **Limited practical application contribution:** Although the paper suggests various potential applications, it appears the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBIM and Construction Integration · Advanced Manufacturing and Logistics Optimization · Multimedia Communication and Technology
MethodsDiffusion
