Improving Cross-modal Alignment for Text-Guided Image Inpainting

Yucheng Zhou; Guodong Long

arXiv:2301.11362·cs.CV·January 30, 2023·1 cites

Improving Cross-modal Alignment for Text-Guided Image Inpainting

Yucheng Zhou, Guodong Long

PDF

Open Access

TL;DR

This paper introduces a novel text-guided image inpainting model that leverages vision-language pre-trained models and cross-modal alignment techniques to improve the quality of restored images, achieving state-of-the-art results.

Contribution

The work proposes a new CMA model that enhances cross-modal alignment in TGII using distillation and adversarial training, addressing previous computational and alignment limitations.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Effectively restores complex missing regions guided by text.

03

Improves cross-modal alignment through distillation techniques.

Abstract

Text-guided image inpainting (TGII) aims to restore missing regions based on a given text in a damaged image. Existing methods are based on a strong vision encoder and a cross-modal fusion model to integrate cross-modal features. However, these methods allocate most of the computation to visual encoding, while light computation on modeling modality interactions. Moreover, they take cross-modal fusion for depth features, which ignores a fine-grained alignment between text and image. Recently, vision-language pre-trained models (VLPM), encapsulating rich cross-modal alignment knowledge, have advanced in most multimodal tasks. In this work, we propose a novel model for TGII by improving cross-modal alignment (CMA). CMA model consists of a VLPM as a vision-language encoder, an image generator and global-local discriminators. To explore cross-modal alignment knowledge for image restoration,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Mycobacterium research and diagnosis

MethodsInpainting