One Diffusion to Generate Them All

Duong H. Le; Tuan Pham; Sangho Lee; Christopher Clark; Aniruddha Kembhavi; Stephan Mandt; Ranjay Krishna; Jiasen Lu

arXiv:2411.16318·cs.CV·June 16, 2025

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, Jiasen Lu

PDF

Open Access 1 Repo 1 Models

TL;DR

OneDiffusion is a unified large-scale diffusion model capable of performing diverse image synthesis and understanding tasks, including conditional generation, image editing, and multi-view analysis, with a simple training approach that enhances scalability and generalization.

Contribution

It introduces a versatile diffusion framework that supports multiple tasks without specialized architectures, enabling scalable multi-task training and smooth adaptation to various resolutions.

Findings

01

Achieves competitive results across diverse tasks

02

Supports multi-view generation and camera pose estimation

03

Operates effectively with relatively small training datasets

Abstract

We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps, while also handling tasks like image deblurring, upscaling, and reverse processes such as depth estimation and segmentation. Additionally, OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs. Our model takes a straightforward yet effective approach by treating all tasks as frame sequences with varying noise scales during training, allowing any frame to act as a conditioning image at inference time. Our unified training framework removes the need for specialized architectures, supports scalable multi-task training, and adapts smoothly to any resolution,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lehduong/onediffusion
pytorchOfficial

Models

🤗
lehduong/OneDiffusion
model· 7 dl· ♡ 42
7 dl♡ 42

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Multimodal Machine Learning Applications

MethodsDiffusion