AI Illustrator: Translating Raw Descriptions into Images by Prompt-based   Cross-Modal Generation

Yiyang Ma; Huan Yang; Bei Liu; Jianlong Fu; Jiaying Liu

arXiv:2209.03160·cs.CV·September 9, 2022

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu

PDF

1 Repo

TL;DR

AI Illustrator introduces a prompt-based cross-modal framework leveraging pre-trained models to translate complex textual descriptions into visually appealing images, enhancing automatic book illustration with semantic accuracy and style adaptation.

Contribution

This work presents a novel framework combining CLIP and StyleGAN for translating raw descriptions into images without external paired data, and introduces a new benchmark for evaluation.

Findings

01

Outperforms existing methods in handling complex descriptions

02

Successfully generates semantically consistent images

03

Provides a new benchmark with 200 descriptions

Abstract

AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (e.g., "gloomy" or "Asian"). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN which takes Image Embeddings as inputs and is trained by combined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

researchmm/ai_illustrator
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStyleGAN · Adaptive Instance Normalization · Dense Connections · Convolution · Feedforward Network · HuMan(Expedia)||How do I get a human at Expedia? · R1 Regularization · Contrastive Language-Image Pre-training