N\"UWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

Minheng Ni; Chenfei Wu; Haoyang Huang; Daxin Jiang; Wangmeng Zuo; Nan; Duan

arXiv:2202.05009·cs.CV·February 11, 2022·1 cites

N\"UWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, Wangmeng Zuo, Nan, Duan

PDF

Open Access

TL;DR

NÜWA-LIP introduces a novel language-guided image inpainting method that leverages defect-free VQGAN and multi-perspective sequence modeling to improve visual quality and robustness, outperforming recent baselines.

Contribution

The paper proposes NÜWA-LIP, combining defect-free VQGAN with multi-perspective sequence modeling to address receptive spreading and information loss in language-guided image inpainting.

Findings

01

DF-VQGAN is more robust than VQGAN.

02

NÜWA-LIP outperforms recent baselines on open-domain benchmarks.

03

The method effectively preserves non-defective regions while filling in defective areas.

Abstract

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged. However, the encoding process of existing models suffers from either receptive spreading of defective regions or information loss of non-defective regions, giving rise to visually unappealing inpainting results. To address the above issues, this paper proposes N\"UWA-LIP by incorporating defect-free VQGAN (DF-VQGAN) with multi-perspective sequence to sequence (MP-S2S). In particular, DF-VQGAN introduces relative estimation to control receptive spreading and adopts symmetrical connections to protect information. MP-S2S further enhances visual information from complementary perspectives, including both low-level pixels and high-level tokens. Experiments show that DF-VQGAN performs more robustness than VQGAN. To evaluate the inpainting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning

MethodsInpainting