FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with   Arbitrary Resolution

Shuai Wang; Zexian Li; Tianhui Song; Xubin Li; Tiezheng Ge; Bo Zheng,; Limin Wang

arXiv:2410.22655·cs.CV·October 31, 2024

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

Shuai Wang, Zexian Li, Tianhui Song, Xubin Li, Tiezheng Ge, Bo Zheng,, Limin Wang

PDF

Open Access 1 Models

TL;DR

FlowDCN introduces a convolution-based generative model with linear complexity capable of high-quality arbitrary-resolution image synthesis, outperforming transformer-based methods in speed, quality, and efficiency.

Contribution

The paper presents FlowDCN, a novel convolutional architecture with deformable convolution blocks for efficient, high-quality arbitrary-resolution image generation, surpassing transformer-based models.

Findings

01

Achieves 4.30 sFID on ImageNet 256x256 benchmark.

02

Outperforms transformer methods in convergence speed, visual quality, and efficiency.

03

Reduces parameters by 8% and FLOPs by 20%.

Abstract

Arbitrary-resolution image generation still remains a challenging task in AIGC, as it requires handling varying resolutions and aspect ratios while maintaining high visual quality. Existing transformer-based diffusion methods suffer from quadratic computation cost and limited resolution extrapolation capabilities, making them less effective for this task. In this paper, we propose FlowDCN, a purely convolution-based generative model with linear time and memory complexity, that can efficiently generate high-quality images at arbitrary resolutions. Equipped with a new design of learnable group-wise deformable convolution block, our FlowDCN yields higher flexibility and capability to handle different resolutions with a single model. FlowDCN achieves the state-of-the-art 4.30 sFID on $256 \times 256$ ImageNet Benchmark and comparable resolution extrapolation results, surpassing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
wangsssssss/FlowDCN
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques

MethodsConvolution · Deformable Convolution · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings