Remember What You have drawn: Semantic Image Manipulation with Memory
Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty

TL;DR
This paper introduces MIM-Net, a memory-based network for semantic image manipulation guided by natural language, achieving more realistic and accurate results by focusing on relevant regions and learning robust memory representations.
Contribution
The paper presents a novel memory-based network with a two-stage training process and a target localization unit for improved semantic image manipulation.
Findings
Outperforms existing methods on four datasets
Effectively focuses on manipulated regions
Learns robust memory representations
Abstract
Image manipulation with natural language, which aims to manipulate images with the guidance of language descriptions, has been a challenging problem in the fields of computer vision and natural language processing (NLP). Currently, a number of efforts have been made for this task, but their performances are still distant away from generating realistic and text-conformed manipulated images. Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description. We propose a two-stage network with an additional reconstruction stage to learn the latent memories efficiently. To avoid the unnecessary background changes, we propose a Target Localization Unit (TLU) to focus on the manipulation of the region mentioned by the text.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
