Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Benno Krojer, Dheeraj Vattikonda, Luis Lara, Varun Jampani, Eva, Portelance, Christopher Pal, Siva Reddy

TL;DR
This paper introduces the AURORA dataset for action and reasoning-centric image editing, demonstrating its effectiveness in training models that outperform previous methods on diverse editing tasks, with improved evaluation metrics.
Contribution
The paper presents a high-quality, curated dataset for action and reasoning-based image editing, along with a new benchmark and a state-of-the-art editing model.
Findings
Model fine-tuned on AURORA outperforms previous models on diverse tasks.
Proposed a new automatic metric focusing on discriminative understanding.
Human evaluation shows significant improvements over prior methods.
Abstract
An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms
MethodsFocus
