Text as Neural Operator: Image Manipulation by Text Instruction

Tianhao Zhang; Hung-Yu Tseng; Lu Jiang; Weilong Yang; Honglak Lee,; Irfan Essa

arXiv:2008.04556·cs.CV·November 30, 2021

Text as Neural Operator: Image Manipulation by Text Instruction

Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee,, Irfan Essa

PDF

1 Repo

TL;DR

This paper introduces a GAN-based approach that treats text as neural operators to enable complex, multi-object image editing guided by natural language instructions, improving fidelity and semantic relevance.

Contribution

It proposes a novel method that uses text as neural operators for local image feature modification, advancing multimodal image manipulation techniques.

Findings

01

Outperforms recent baselines on three datasets

02

Generates images with higher fidelity and semantic relevance

03

Enhances image retrieval performance

Abstract

In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community. The input to conditional image generation has evolved from image-only to multimodality. In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects. The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image. We propose a GAN-based method to tackle this problem. The key idea is to treat text as neural operators to locally modify the image feature. We show that the proposed model performs favorably against recent strong baselines on three public datasets. Specifically, it generates images of greater fidelity and semantic relevance, and when used as a image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/tim-gan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.