SegAttnGAN: Text to Image Generation with Segmentation Attention
Yuchuan Gou, Qiancheng Wu, Minghao Li, Bo Gong, Mei Han

TL;DR
SegAttnGAN introduces segmentation attention into text-to-image generation, improving image realism and quantitative scores by leveraging segmentation data for guidance during training.
Contribution
The paper presents a novel generative network that incorporates segmentation information, enhancing the quality of generated images in text-to-image synthesis.
Findings
Achieved Inception Score of 4.84 on CUB dataset
Achieved Inception Score of 3.52 on Oxford-102 dataset
Self-attention version performs well with generated segmentation data
Abstract
In this paper, we propose a novel generative network (SegAttnGAN) that utilizes additional segmentation information for the text-to-image synthesis task. As the segmentation data introduced to the model provides useful guidance on the generator training, the proposed model can generate images with better realism quality and higher quantitative measures compared with the previous state-of-art methods. We achieved Inception Score of 4.84 on the CUB dataset and 3.52 on the Oxford-102 dataset. Besides, we tested the self-attention SegAttnGAN which uses generated segmentation data instead of masks from datasets for attention and achieved similar high-quality results, suggesting that our model can be adapted for the text-to-image synthesis task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Video Analysis and Summarization
