Action-based image editing guided by human instructions

Maria Mihaela Trusca; Mingxiao Li; Marie-Francine Moens

arXiv:2412.04558·cs.CV·February 5, 2025

Action-based image editing guided by human instructions

Maria Mihaela Trusca, Mingxiao Li, Marie-Francine Moens

PDF

Open Access

TL;DR

This paper introduces a dynamic image editing approach guided by human action instructions, enabling modification of object positions or postures to depict actions while preserving visual properties.

Contribution

A novel model that recognizes action discrepancies and uses video-derived datasets to perform action-based image editing with high reasoning capabilities.

Findings

01

Significant improvements in action-based image editing quality

02

Effective recognition of contrastive action discrepancies

03

High reasoning ability to generate final scenes of actions

Abstract

Text-based image editing is typically approached as a static task that involves operations such as inserting, deleting, or modifying elements of an input image based on human instructions. Given the static nature of this task, in this paper, we aim to make this task dynamic by incorporating actions. By doing this, we intend to modify the positions or postures of objects in the image to depict different actions while maintaining the visual properties of the objects. To implement this challenging task, we propose a new model that is sensitive to action text instructions by learning to recognize contrastive action discrepancies. The model training is done on new datasets defined by extracting frames from videos that show the visual scenes before and after an action. We show substantial improvements in image editing using action-based text instructions and high reasoning capabilities that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging