Action-based image editing guided by human instructions
Maria Mihaela Trusca, Mingxiao Li, Marie-Francine Moens

TL;DR
This paper introduces a dynamic image editing approach guided by human action instructions, enabling modification of object positions or postures to depict actions while preserving visual properties.
Contribution
A novel model that recognizes action discrepancies and uses video-derived datasets to perform action-based image editing with high reasoning capabilities.
Findings
Significant improvements in action-based image editing quality
Effective recognition of contrastive action discrepancies
High reasoning ability to generate final scenes of actions
Abstract
Text-based image editing is typically approached as a static task that involves operations such as inserting, deleting, or modifying elements of an input image based on human instructions. Given the static nature of this task, in this paper, we aim to make this task dynamic by incorporating actions. By doing this, we intend to modify the positions or postures of objects in the image to depict different actions while maintaining the visual properties of the objects. To implement this challenging task, we propose a new model that is sensitive to action text instructions by learning to recognize contrastive action discrepancies. The model training is done on new datasets defined by extracting frames from videos that show the visual scenes before and after an action. We show substantial improvements in image editing using action-based text instructions and high reasoning capabilities that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging
