Remember What You have drawn: Semantic Image Manipulation with Memory

Xiangxi Shi; Zhonghua Wu; Guosheng Lin; Jianfei Cai; Shafiq Joty

arXiv:2107.12579·cs.CV·July 28, 2021·5 cites

Remember What You have drawn: Semantic Image Manipulation with Memory

Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty

PDF

Open Access

TL;DR

This paper introduces MIM-Net, a memory-based network for semantic image manipulation guided by natural language, achieving more realistic and accurate results by focusing on relevant regions and learning robust memory representations.

Contribution

The paper presents a novel memory-based network with a two-stage training process and a target localization unit for improved semantic image manipulation.

Findings

01

Outperforms existing methods on four datasets

02

Effectively focuses on manipulated regions

03

Learns robust memory representations

Abstract

Image manipulation with natural language, which aims to manipulate images with the guidance of language descriptions, has been a challenging problem in the fields of computer vision and natural language processing (NLP). Currently, a number of efforts have been made for this task, but their performances are still distant away from generating realistic and text-conformed manipulated images. Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description. We propose a two-stage network with an additional reconstruction stage to learn the latent memories efficiently. To avoid the unnecessary background changes, we propose a Target Localization Unit (TLU) to focus on the manipulation of the region mentioned by the text.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection