DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Canyu Zhao; Yanlong Sun; Mingyu Liu; Huanyi Zheng; Muzhi Zhu; Zhiyue Zhao; Hao Chen; Tong He; Chunhua Shen

arXiv:2502.17157·cs.CV·October 10, 2025

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Canyu Zhao, Yanlong Sun, Mingyu Liu, Huanyi Zheng, Muzhi Zhu, Zhiyue Zhao, Hao Chen, Tong He, Chunhua Shen

PDF

1 Repo 1 Models

TL;DR

DICEPTION is a versatile diffusion-based visual perception model that efficiently handles multiple tasks with minimal training data and computational resources, achieving near state-of-the-art performance.

Contribution

The paper introduces DICEPTION, a generalist diffusion model that re-purposes pre-trained text-to-image diffusion models for diverse perception tasks with low data and computational costs.

Findings

01

Achieves performance comparable to SOTA models with only 0.06% of their data

02

Requires fine-tuning on as few as 50 images for new tasks

03

Subtle classifier-free guidance improves depth and normal estimation

Abstract

This paper's primary objective is to develop a robust generalist perception model capable of addressing multiple tasks under constraints of computational resources and limited training data. We leverage text-to-image diffusion models pre-trained on billions of images and successfully introduce our DICEPTION, a visual generalist model. Exhaustive evaluations demonstrate that DICEPTION effectively tackles diverse perception tasks, even achieving performance comparable to SOTA single-task specialist models. Specifically, we achieve results on par with SAM-vit-h using only 0.06% of their data (e.g., 600K vs.\ 1B pixel-level annotated images). We designed comprehensive experiments on architectures and input paradigms, demonstrating that the key to successfully re-purposing a single diffusion model for multiple perception tasks lies in maximizing the preservation of the pre-trained model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aim-uofa/Diception
noneOfficial

Models

🤗
Canyu/DICEPTION
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion