DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Yuang Ai; Qihang Fan; Xuefeng Hu; Zhenheng Yang; Ran He; Huaibo Huang

arXiv:2505.11196·cs.CV·September 23, 2025

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Yuang Ai, Qihang Fan, Xuefeng Hu, Zhenheng Yang, Ran He, Huaibo Huang

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces DiCo, a convolution-based diffusion model that replaces self-attention with efficient convolutional modules, achieving comparable or better performance with significantly reduced computational costs.

Contribution

The paper proposes a novel convolutional diffusion model, DiCo, with a channel attention mechanism to enhance feature diversity, replacing costly self-attention in diffusion transformers.

Findings

01

DiCo-XL achieves an FID of 2.05 on ImageNet at 256x256.

02

DiCo models are 2.7x to 3.1x faster than DiT-XL/2.

03

Purely convolutional DiCo performs well on text-to-image tasks.

Abstract

Diffusion Transformer (DiT), a promising diffusion model for visual generation, demonstrates impressive performance but incurs significant computational overhead. Intriguingly, analysis of pre-trained DiT models reveals that global self-attention is often redundant, predominantly capturing local patterns-highlighting the potential for more efficient alternatives. In this paper, we revisit convolution as an alternative building block for constructing efficient and expressive diffusion models. However, naively replacing self-attention with convolution typically results in degraded performance. Our investigations attribute this performance gap to the higher channel redundancy in ConvNets compared to Transformers. To resolve this, we introduce a compact channel attention mechanism that promotes the activation of more diverse channels, thereby enhancing feature diversity. This leads to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shallowdream204/dico
noneOfficial

Models

🤗
shallowdream204/DiCo
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Layer Normalization · Diffusion · Byte Pair Encoding · Label Smoothing · Adam · Softmax