A Simple Approach to Unifying Diffusion-based Conditional Generation

Xirui Li; Charles Herrmann; Kelvin C.K. Chan; Yinxiao Li; Deqing Sun,; Chao Ma; Ming-Hsuan Yang

arXiv:2410.11439·cs.CV·April 8, 2025

A Simple Approach to Unifying Diffusion-based Conditional Generation

Xirui Li, Charles Herrmann, Kelvin C.K. Chan, Yinxiao Li, Deqing Sun,, Chao Ma, Ming-Hsuan Yang

PDF

Open Access 1 Models 1 Video 3 Reviews

TL;DR

This paper presents a simple, unified diffusion-based framework for diverse conditional image generation tasks, achieving comparable or better results than specialized or complex models with minimal additional parameters.

Contribution

The authors introduce a single, efficient diffusion model that unifies various conditional generation tasks without complex training or architectural modifications.

Findings

01

Comparable results to specialized methods

02

Better than prior unified approaches

03

Supports multi-signal conditional generation

Abstract

Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation. Instead of proposing another specialized technique, we introduce a simple, unified framework to handle diverse conditional generation tasks involving a specific image-condition correlation. By learning a joint distribution over a correlated image pair (e.g. image and depth) with a diffusion model, our approach enables versatile capabilities via different inference-time sampling schemes, including controllable image generation (e.g. depth to image), estimation (e.g. image to depth), signal guidance, joint generation (image & depth), and coarse control. Previous attempts at unification often introduce significant complexity through multi-stage training, architectural modification, or increased…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The main strengths are in their lightweight configuration, good reported performance, and novelty in using independent timestep scheduling. - The writing of the paper is clear with comprehensive evaluations supporting the superiority of the proposed method. - The model is overall lightweight in terms of the size and the training time, compared to previous image conditional add-ons, e.g., ControlNets. This makes the method application-friendly. - Using disentangled noise level scheduling from Di

Weaknesses

Although I believe the current version of the manuscript is above acceptance threshold, there are some limitations that prevents me recommending for higher honors (e.g., Highlight/Oral). - Although there are five image conditional models trained using the proposed framework, it seems that only three (Depth, SoftEdge, Pose) types are compared quantitatively. This asymmetry in the quantitative/qualitative demonstration makes the manuscript incomplete. - Moreover, there are also other metrics that

Reviewer 02Rating 5Confidence 4

Strengths

- Unified framework for handling diverse conditional generation tasks through a joint distribution approach - Lightweight adaptation of existing diffusion models with minimal parameter overhead - Clear empirical demonstration of the proposed framework

Weaknesses

1. **Missing Comparisons/References** * The paper lacks comparisons with several important recent methods in depth estimation, and this limits our understanding of where the method stands in relation to the current state-of-the-art in depth estimation. - ZoeDepth - Depth Anything - Depth Anything v2 (which is only used as an annotator in this work) * In addition, it will be helpful to discuss "DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models" [Kim et al.], which

Reviewer 03Rating 8Confidence 4

Strengths

- The paper is well-written and easy to follow. - The proposed method provides a parameter-efficient way to model the joint image-condition distribution, which is more versatile for different conditioning tasks compared to specialized conditional methods. - The authors provide sufficient experiments and comparisons for their method. - Based on the provided results, the proposed method seems effective in modeling the joint image-condition distributions, and performing conditional generation

Weaknesses

- Conditional generation using the proposed method requires performing multiple denoising paths, which makes the inference compationally intensive compared to direct conditioning, especially for multiple conditions.

Code & Models

Models

🤗
lixirui142/unicon
model

Videos

A Simple Approach to Unifying Diffusion-based Conditional Generation· slideslive

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Metaheuristic Optimization Algorithms Research · VLSI and FPGA Design Techniques

MethodsDiffusion · Balanced Selection