Entity-Level Text-Guided Image Manipulation

Yikai Wang; Jianan Wang; Guansong Lu; Hang Xu; Zhenguo Li; Wei Zhang,; and Yanwei Fu

arXiv:2302.11383·cs.CV·February 23, 2023·1 cites

Entity-Level Text-Guided Image Manipulation

Yikai Wang, Jianan Wang, Guansong Lu, Hang Xu, Zhenguo Li, Wei Zhang,, and Yanwei Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces SeMani, a novel framework for real-world, entity-level text-guided image manipulation that accurately edits and merges entities based on text descriptions while preserving irrelevant regions.

Contribution

SeMani is the first framework to perform entity-level text-guided image manipulation in real-world scenarios, combining semantic alignment with advanced generative models for precise editing.

Findings

01

SeMani outperforms baseline methods in accuracy and flexibility.

02

SeMani effectively distinguishes entity-relevant regions.

03

SeMani achieves zero-shot manipulation on real datasets.

Abstract

Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical applications. In this work, we study a novel task on text-guided image manipulation on the entity level in the real world (eL-TGIM). The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the entity-irrelevant regions, and (3) to merge the manipulated entity into the image naturally. To this end, we propose an elegant framework, dubbed as SeMani, forming the Semantic Manipulation of real-world images that can not only edit the appearance of entities but also generate new entities corresponding to the text guidance. To solve eL-TGIM, SeMani decomposes the task into two phases: the semantic alignment phase and the image manipulation phase. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yikai-wang/semani
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Multimodal Machine Learning Applications

MethodsDiffusion