A System for Automated Image Editing from Natural Language Commands
Jacqueline Brixey, Ramesh Manuvinakurike, Nham Le, Tuan Lai, Walter, Chang, Trung Bui

TL;DR
This paper introduces a system that interprets natural language commands to automate image editing, leveraging a large annotated corpus and machine learning models to accurately map requests to executable actions.
Contribution
The work presents a novel framework for translating natural language image editing requests into commands, along with a new annotated corpus and evaluation of multiple ML models.
Findings
LSTM, SVM, and bidirectional LSTM-CRF models perform best in detecting actions and entities.
A corpus of over 6000 natural language image editing requests was created and annotated.
The framework effectively maps natural language to image editing commands.
Abstract
This work presents the task of modifying images in an image editing program using natural language written commands. We utilize a corpus of over 6000 image edit text requests to alter real world images collected via crowdsourcing. A novel framework composed of actions and entities to map a user's natural language request to executable commands in an image editing program is described. We resolve previously labeled annotator disagreement through a voting process and complete annotation of the corpus. We experimented with different machine learning models and found that the LSTM, the SVM, and the bidirectional LSTM-CRF joint models are the best to detect image editing actions and associated entities in a given utterance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Support Vector Machine · Long Short-Term Memory
