Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie,, Kristen Grauman, Rogerio Feris

TL;DR
The paper introduces the Fashion IQ dataset and a transformer-based interactive image retrieval system to enhance conversational fashion shopping assistants, enabling more natural and effective image search through dialogue.
Contribution
It presents the first fashion dataset with human-generated captions and visual attributes, along with a novel transformer-based user simulator and image retriever for dialog-based retrieval.
Findings
Improved retrieval performance over state-of-the-art methods
Dataset enables more natural conversational shopping interactions
Transformer-based model effectively integrates visual attributes and dialogue history
Abstract
Conversational interfaces for the detail-oriented retail fashion domain are more natural, expressive, and user friendly than classical keyword-based search interfaces. In this paper, we introduce the Fashion IQ dataset to support and advance research on interactive fashion image retrieval. Fashion IQ is the first fashion dataset to provide human-generated captions that distinguish similar pairs of garment images together with side-information consisting of real-world product descriptions and derived visual attribute labels for these images. We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Visual Attention and Saliency Detection
