A Benchmark and Baseline for Language-Driven Image Editing

Jing Shi; Ning Xu; Trung Bui; Franck Dernoncourt; Zheng Wen; Chenliang; Xu

arXiv:2010.02330·cs.CV·October 7, 2020·5 cites

A Benchmark and Baseline for Language-Driven Image Editing

Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang, Xu

PDF

Open Access

TL;DR

This paper introduces a new dataset and baseline method for language-driven image editing that supports both local and global modifications, aiming to advance the field towards more flexible and interpretable editing tools.

Contribution

The paper presents a novel dataset supporting local and global edits with annotations, and a baseline method that predicts operation parameters for interpretable image editing.

Findings

01

The baseline method performs well on challenging user data.

02

The approach is highly interpretable due to its modular design.

03

Supports both local and global image editing tasks.

Abstract

Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations. Besides, we also propose a baseline method that fully utilizes the annotation to solve this problem. Our new method treats each editing operation as a sub-module and can automatically predict operation parameters. Not only performing well on challenging user data, but such an approach is also highly interpretable. We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques