A Simple Approach to Unifying Diffusion-based Conditional Generation
Xirui Li, Charles Herrmann, Kelvin C.K. Chan, Yinxiao Li, Deqing Sun,, Chao Ma, Ming-Hsuan Yang

TL;DR
This paper presents a simple, unified diffusion-based framework for diverse conditional image generation tasks, achieving comparable or better results than specialized or complex models with minimal additional parameters.
Contribution
The authors introduce a single, efficient diffusion model that unifies various conditional generation tasks without complex training or architectural modifications.
Findings
Comparable results to specialized methods
Better than prior unified approaches
Supports multi-signal conditional generation
Abstract
Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation. Instead of proposing another specialized technique, we introduce a simple, unified framework to handle diverse conditional generation tasks involving a specific image-condition correlation. By learning a joint distribution over a correlated image pair (e.g. image and depth) with a diffusion model, our approach enables versatile capabilities via different inference-time sampling schemes, including controllable image generation (e.g. depth to image), estimation (e.g. image to depth), signal guidance, joint generation (image & depth), and coarse control. Previous attempts at unification often introduce significant complexity through multi-stage training, architectural modification, or increased…
Peer Reviews
Decision·ICLR 2025 Poster
The main strengths are in their lightweight configuration, good reported performance, and novelty in using independent timestep scheduling. - The writing of the paper is clear with comprehensive evaluations supporting the superiority of the proposed method. - The model is overall lightweight in terms of the size and the training time, compared to previous image conditional add-ons, e.g., ControlNets. This makes the method application-friendly. - Using disentangled noise level scheduling from Di
Although I believe the current version of the manuscript is above acceptance threshold, there are some limitations that prevents me recommending for higher honors (e.g., Highlight/Oral). - Although there are five image conditional models trained using the proposed framework, it seems that only three (Depth, SoftEdge, Pose) types are compared quantitatively. This asymmetry in the quantitative/qualitative demonstration makes the manuscript incomplete. - Moreover, there are also other metrics that
- Unified framework for handling diverse conditional generation tasks through a joint distribution approach - Lightweight adaptation of existing diffusion models with minimal parameter overhead - Clear empirical demonstration of the proposed framework
1. **Missing Comparisons/References** * The paper lacks comparisons with several important recent methods in depth estimation, and this limits our understanding of where the method stands in relation to the current state-of-the-art in depth estimation. - ZoeDepth - Depth Anything - Depth Anything v2 (which is only used as an annotator in this work) * In addition, it will be helpful to discuss "DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models" [Kim et al.], which
- The paper is well-written and easy to follow. - The proposed method provides a parameter-efficient way to model the joint image-condition distribution, which is more versatile for different conditioning tasks compared to specialized conditional methods. - The authors provide sufficient experiments and comparisons for their method. - Based on the provided results, the proposed method seems effective in modeling the joint image-condition distributions, and performing conditional generation
- Conditional generation using the proposed method requires performing multiple denoising paths, which makes the inference compationally intensive compared to direct conditioning, especially for multiple conditions.
Code & Models
Videos
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Metaheuristic Optimization Algorithms Research · VLSI and FPGA Design Techniques
MethodsDiffusion · Balanced Selection
