Yuan-TecSwin: A text conditioned Diffusion model with Swin-transformer blocks
Shaohua Wu, Tong Yu, Shenling Wang, Xudong Zhao

TL;DR
Yuan-TecSwin introduces a text-conditioned diffusion model utilizing Swin-transformer blocks to enhance long-range semantic understanding, achieving state-of-the-art image generation quality on ImageNet with improved inference performance.
Contribution
The paper presents Yuan-TecSwin, a novel diffusion model replacing CNN blocks with Swin-transformer blocks for better non-local feature extraction in text-to-image synthesis.
Findings
Achieves state-of-the-art FID score of 1.37 on ImageNet.
Improves inference performance by 10% with adapted diffusion stages.
Generated images are indistinguishable from human-painted ones in tests.
Abstract
Diffusion models have shown remarkable capacity in image synthesis based on their U-shaped architecture and convolutional neural networks (CNN) as basic blocks. The locality of the convolution operation in CNN may limit the model's ability to understand long-range semantic information. To address this issue, we propose Yuan-TecSwin, a text-conditioned diffusion model with Swin-transformer in this work. The Swin-transformer blocks take the place of CNN blocks in the encoder and decoder, to improve the non-local modeling ability in feature extraction and image restoration. The text-image alignment is improved with a well-chosen text encoder, effective utilization of text embedding, and careful design in the incorporation of text condition. Using an adapted time step to search in different diffusion stages, inference performance is further improved by 10%. Yuan-TecSwin achieves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Neural Network Applications
