CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang

TL;DR
This paper introduces CLEAR, a convolution-like local attention mechanism that linearizes pre-trained diffusion transformers, significantly reducing computational complexity while maintaining high-quality image generation.
Contribution
The paper proposes a novel linear attention method called CLEAR, enabling efficient fine-tuning of pre-trained diffusion transformers with minimal data and computational resources.
Findings
Reduces attention computation by 99.5%
Accelerates image generation by 6.3 times for 8K images
Maintains comparable quality to original models after fine-tuning
Abstract
Diffusion Transformers (DiT) have become a leading architecture in image generation. However, the quadratic complexity of attention mechanisms, which are responsible for modeling token-wise relationships, results in significant latency when generating high-resolution images. To address this issue, we aim at a linear attention mechanism in this paper that reduces the complexity of pre-trained DiTs to linear. We begin our exploration with a comprehensive summary of existing efficient attention mechanisms and identify four key factors crucial for successful linearization of pre-trained DiTs: locality, formulation consistency, high-rank attention maps, and feature integrity. Based on these insights, we introduce a convolution-like local attention strategy termed CLEAR, which limits feature interactions to a local window around each query token, and thus achieves linear complexity. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Cell Image Analysis Techniques
MethodsSoftmax · Attention Is All You Need
