TL;DR
MovingFashion introduces a new dataset and a novel retrieval network for identifying clothing items in social media videos, achieving high accuracy and outperforming existing methods in the video-to-shop e-fashion challenge.
Contribution
The paper presents the first publicly available dataset for video-to-shop clothing retrieval and a new network, SEAM Match-RCNN, utilizing domain adaptation for effective retrieval.
Findings
Achieved 80% accuracy in retrieving correct products within top 5 results.
Outperformed existing state-of-the-art methods on MovingFashion dataset.
Demonstrated effectiveness of attention-based weighted sum of video frames.
Abstract
Retrieving clothes which are worn in social media videos (Instagram, TikTok) is the latest frontier of e-fashion, referred to as "video-to-shop" in the computer vision literature. In this paper we present MovingFashion, the first publicly available dataset to cope with this challenge. MovingFashion is composed of 14855 social videos, each one of them associated to e-commerce "shop" images where the corresponding clothing items are clearly portrayed. In addition, we present a network for retrieving the shop images in this scenario, dubbed SEAM Match-RCNN. The model is trained by image-to-video domain adaptation, allowing to use video sequences where only their association with a shop image is given, eliminating the need of millions of annotated bounding boxes. SEAM Match-RCNN builds an embedding, where an attention-based weighted sum of few frames (10) of a social video is enough to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
MovingFashion: a Benchmark for the Video-to-Shop Challenge· youtube
Taxonomy
MethodsSelf-supervised Equivariant Attention Mechanism
