CAISE: Conversational Agent for Image Search and Editing
Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung, Bui, Mohit Bansal

TL;DR
This paper introduces CAISE, a novel dataset and system for conversational image search and editing, enabling more accessible and integrated image manipulation through grounded dialogue, with a baseline model demonstrating promising results.
Contribution
The paper presents the first dataset for conversational image search and editing, along with a baseline model and a tool for real-world application, advancing automated image editing assistance.
Findings
First dataset with conversational image search and editing annotations
Baseline generator-extractor model for command selection
Public release of code and dataset for future research
Abstract
Demand for image editing has been increasing as users' desire for expression is also increasing. However, for most users, image editing tools are not easy to use since the tools require certain expertise in photo effects and have complex interfaces. Hence, users might need someone to help edit their images, but having a personal dedicated human assistant for every user is impossible to scale. For that reason, an automated assistant system for image editing is desirable. Additionally, users want more image sources for diverse image editing works, and integrating an image search functionality into the editing tool is a potential remedy for this demand. Thus, we propose a dataset of an automated Conversational Agent for Image Search and Editing (CAISE). To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
