EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models
Kun Wang, Donglin Di, Tonghua Su, Lei Fan

TL;DR
This paper introduces EFDiT, a diffusion transformer model that improves fine-grained image generation by integrating semantic information from multiple class levels and enhancing image detail through super-resolution techniques.
Contribution
The paper proposes a tiered embedder and ProAttention mechanism to better incorporate semantic information and improve image detail in diffusion-based fine-grained image generation.
Findings
Outperforms state-of-the-art fine-tuning methods on benchmarks
Effectively reduces semantic entanglement in generated images
Enhances image details using super-resolution during perceptual generation
Abstract
Diffusion models are highly regarded for their controllability and the diversity of images they generate. However, class-conditional generation methods based on diffusion models often focus on more common categories. In large-scale fine-grained image generation, issues of semantic information entanglement and insufficient detail in the generated images still persist. This paper attempts to introduce a concept of a tiered embedder in fine-grained image generation, which integrates semantic information from both super and child classes, allowing the diffusion model to better incorporate semantic information and address the issue of semantic entanglement. To address the issue of insufficient detail in fine-grained images, we introduce the concept of super-resolution during the perceptual information generation stage, enhancing the detailed features of fine-grained images through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Image Enhancement Techniques
