Learning by Planning: Language-Guided Global Image Editing

Jing Shi; Ning Xu; Yihang Xu; Trung Bui; Franck Dernoncourt; Chenliang; Xu

arXiv:2106.13156·cs.CV·June 25, 2021

Learning by Planning: Language-Guided Global Image Editing

Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang, Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a text-to-operation model for language-guided global image editing, translating vague language requests into interpretable, differentiable editing sequences, with a novel planning algorithm for training supervision.

Contribution

The paper proposes a new text-to-operation model with an operation planning algorithm, enabling interpretable and effective language-guided image editing from only target image supervision.

Findings

01

Outperforms previous GAN-based methods on new datasets

02

Operates with interpretable, differentiable editing steps

03

Uses pseudo ground truth for stable training

Abstract

Recently, language-guided global image editing draws increasing attention with growing application potentials. However, previous GAN-based methods are not only confined to domain-specific, low-resolution data but also lacking in interpretability. To overcome the collective difficulties, we develop a text-to-operation model to map the vague editing language request into a series of editing operations, e.g., change contrast, brightness, and saturation. Each operation is interpretable and differentiable. Furthermore, the only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions. Hence, we propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth. Comparison experiments on the newly collected MA5k-Req dataset and GIER dataset show the advantages of our methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jshi31/T2ONet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques