Learning Action and Reasoning-Centric Image Editing from Videos and   Simulations

Benno Krojer; Dheeraj Vattikonda; Luis Lara; Varun Jampani; Eva; Portelance; Christopher Pal; Siva Reddy

arXiv:2407.03471·cs.CV·October 18, 2024

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Benno Krojer, Dheeraj Vattikonda, Luis Lara, Varun Jampani, Eva, Portelance, Christopher Pal, Siva Reddy

PDF

Open Access 1 Repo 1 Models 4 Datasets

TL;DR

This paper introduces the AURORA dataset for action and reasoning-centric image editing, demonstrating its effectiveness in training models that outperform previous methods on diverse editing tasks, with improved evaluation metrics.

Contribution

The paper presents a high-quality, curated dataset for action and reasoning-based image editing, along with a new benchmark and a state-of-the-art editing model.

Findings

01

Model fine-tuned on AURORA outperforms previous models on diverse tasks.

02

Proposed a new automatic metric focusing on discriminative understanding.

03

Human evaluation shows significant improvements over prior methods.

Abstract

An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

McGill-NLP/AURORA
pytorchOfficial

Models

🤗
McGill-NLP/AURORA
model· 10 dl· ♡ 4
10 dl♡ 4

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms

MethodsFocus