Style Generation: Image Synthesis based on Coarsely Matched Texts
Mengyao Cui, Zhe Zhu, Shao-Ping Lu, Yulu Yang

TL;DR
This paper introduces a novel two-stage GAN framework for stylizing images based on coarsely matched text guidance, addressing limitations of existing text-to-image synthesis methods.
Contribution
It proposes the task of text-based style generation and develops a two-stage GAN model to generate and refine image styles from coarse textual descriptions.
Findings
Effective style generation from coarse text guidance
Improved image stylization quality demonstrated in experiments
New datasets for text-based style generation are provided
Abstract
Previous text-to-image synthesis algorithms typically use explicit textual instructions to generate/manipulate images accurately, but they have difficulty adapting to guidance in the form of coarsely matched texts. In this work, we attempt to stylize an input image using such coarsely matched text as guidance. To tackle this new problem, we introduce a novel task called text-based style generation and propose a two-stage generative adversarial network: the first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature, which is produced by a multi-modality style synthesis module. We re-filter one existing dataset and collect a new dataset for the task. Extensive experiments and ablation studies are conducted to validate our framework. The practical potential of our work is demonstrated by various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation
