Text-to-Image GAN with Pretrained Representations

Xiaozhou You; Jian Zhang

arXiv:2501.00116·cs.CV·January 3, 2025

Text-to-Image GAN with Pretrained Representations

Xiaozhou You, Jian Zhang

PDF

Open Access

TL;DR

TIGER introduces a novel GAN architecture utilizing pretrained vision models and high-capacity fusion blocks to achieve faster, more accurate text-to-image synthesis, outperforming existing methods on standard and zero-shot tasks.

Contribution

The paper proposes TIGER, a text-to-image GAN with a vision-empowered discriminator and a high-capacity generator, enhancing performance and speed over prior models.

Findings

01

Achieves state-of-the-art FID scores on COCO and CUB datasets.

02

Demonstrates superior zero-shot synthesis with fewer parameters.

03

Faster inference compared to diffusion and autoregressive models.

Abstract

Generating desired images conditioned on given text descriptions has received lots of attention. Recently, diffusion models and autoregressive models have demonstrated their outstanding expressivity and gradually replaced GAN as the favored architectures for text-to-image synthesis. However, they still face some obstacles: slow inference speed and expensive training costs. To achieve more powerful and faster text-to-image synthesis under complex scenes, we propose TIGER, a text-to-image GAN with pretrained representations. To be specific, we propose a vision-empowered discriminator and a high-capacity generator. (i) The vision-empowered discriminator absorbs the complex scene understanding ability and the domain generalization ability from pretrained vision models to enhance model performance. Unlike previous works, we explore stacking multiple pretrained models in our discriminator to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Handwritten Text Recognition Techniques · AI in cancer detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion