FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory
Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh, Chada, Pradeep Natarajan, and Henrik I. Christensen

TL;DR
FashionNTM introduces a novel cascaded memory neural network for multi-turn fashion image retrieval, effectively integrating iterative user feedback to improve retrieval accuracy and user satisfaction.
Contribution
The paper proposes a new cascaded memory neural Turing machine for multi-turn fashion retrieval, enabling better information integration across turns and outperforming existing methods.
Findings
Outperforms state-of-the-art by 50.5% on FashionIQ
Achieves 12.6% improvement on multi-turn Shoes dataset
User study shows 83.1% preference for FashionNTM results
Abstract
Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsTanh Activation · Softmax · Location-based Attention · Sigmoid Activation · Content-based Attention · Long Short-Term Memory · Neural Turing Machine
