DDP: Diffusion Model for Dense Visual Prediction
Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang, Liu, Tong Lu, Zhenguo Li, Ping Luo

TL;DR
This paper introduces DDP, a diffusion-based framework for dense visual prediction tasks that achieves state-of-the-art results without task-specific modifications, offering advantages like dynamic inference and uncertainty estimation.
Contribution
The paper presents DDP, a novel diffusion model framework that generalizes across multiple dense prediction tasks with high performance and simplicity, without requiring task-specific architecture changes.
Findings
Achieves 83.9 mIoU on Cityscapes for semantic segmentation
Attains 70.6 mIoU on nuScenes for BEV map segmentation
Reaches 0.05 REL in depth estimation on KITTI
Abstract
We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Image Retrieval and Classification Techniques
MethodsDiffusion
