Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin; Adam Polyak; Uriel Singer; Yuval Kirstain; Amit Zohar,; Oron Ashual; Devi Parikh; Yaniv Taigman

arXiv:2311.10089·cs.CV·November 17, 2023·2 cites

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar,, Oron Ashual, Devi Parikh, Yaniv Taigman

PDF

Open Access

TL;DR

Emu Edit is a multi-task image editing model that achieves state-of-the-art results by training on diverse generative tasks, utilizing learned task embeddings, and demonstrating strong generalization to new tasks with minimal data.

Contribution

The paper introduces Emu Edit, a novel multi-task image editing framework that unifies various editing tasks as generative problems and enhances performance with learned task embeddings.

Findings

01

Achieves state-of-the-art performance in instruction-based image editing.

02

Successfully generalizes to new tasks with few labeled examples.

03

Provides a new benchmark with seven diverse image editing tasks.

Abstract

Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop Emu Edit we train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks, all of which are formulated as generative tasks. Additionally, to enhance Emu Edit's multi-task learning abilities, we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for Emu Edit's outstanding performance. Furthermore, we show that Emu Edit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cell Image Analysis Techniques