Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language   Feedback

Hui Wu; Yupeng Gao; Xiaoxiao Guo; Ziad Al-Halah; Steven Rennie,; Kristen Grauman; Rogerio Feris

arXiv:1905.12794·cs.CV·November 30, 2020·28 cites

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie,, Kristen Grauman, Rogerio Feris

PDF

Open Access 3 Repos

TL;DR

The paper introduces the Fashion IQ dataset and a transformer-based interactive image retrieval system to enhance conversational fashion shopping assistants, enabling more natural and effective image search through dialogue.

Contribution

It presents the first fashion dataset with human-generated captions and visual attributes, along with a novel transformer-based user simulator and image retriever for dialog-based retrieval.

Findings

01

Improved retrieval performance over state-of-the-art methods

02

Dataset enables more natural conversational shopping interactions

03

Transformer-based model effectively integrates visual attributes and dialogue history

Abstract

Conversational interfaces for the detail-oriented retail fashion domain are more natural, expressive, and user friendly than classical keyword-based search interfaces. In this paper, we introduce the Fashion IQ dataset to support and advance research on interactive fashion image retrieval. Fashion IQ is the first fashion dataset to provide human-generated captions that distinguish similar pairs of garment images together with side-information consisting of real-world product descriptions and derived visual attribute labels for these images. We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Visual Attention and Saliency Detection