Fine-grained Cross-modal Fusion based Refinement for Text-to-Image   Synthesis

Haoran Sun; Yang Wang; Haipeng Liu; Biao Qian

arXiv:2302.08706·cs.CV·February 21, 2023

Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis

Haoran Sun, Yang Wang, Haipeng Liu, Biao Qian

PDF

Open Access 1 Repo

TL;DR

This paper introduces FF-GAN, a novel text-to-image synthesis model that enhances semantic consistency and detail in generated images through fine-grained text-image fusion and global semantic refinement.

Contribution

It proposes a new fusion block and a semantic refinement module to better utilize textual information and improve image quality in text-to-image synthesis.

Findings

01

Outperforms state-of-the-art methods on CUB-200 and COCO datasets

02

Produces images with higher semantic consistency and detail

03

Effective fusion of fine-grained text features into visual generation

Abstract

Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex. We propose a novel Fine-grained text-image Fusion based Generative Adversarial Networks, dubbed FF-GAN, which consists of two modules: Fine-grained text-image Fusion Block (FF-Block) and Global Semantic Refinement (GSR). The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoranhfut/ff-gan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsConvolution