Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce
Yanhao Zhang, Qiang Wang, Pan Pan, Yun Zheng, Cheng Da, Siyang Sun and, Yinghui Xu

TL;DR
Fashion Focus is a multi-modal retrieval system that accurately localizes products in e-commerce videos by integrating visual, linguistic, and interaction data, streamlining the matching process for sellers and consumers.
Contribution
It introduces a unified multi-modal framework for precise product localization in untrimmed videos, combining content structuring and retrieval techniques.
Findings
Achieves accurate video-to-shop matching automatically.
Effectively integrates multiple modalities for localization.
Enhances user experience in e-commerce video shopping.
Abstract
Nowadays, live-stream and short video shopping in E-commerce have grown exponentially. However, the sellers are required to manually match images of the selling products to the timestamp of exhibition in the untrimmed video, resulting in a complicated process. To solve the problem, we present an innovative demonstration of multi-modal retrieval system called "Fashion Focus", which enables to exactly localize the product images in the online video as the focuses. Different modality contributes to the community localization, including visual content, linguistic features and interaction context are jointly investigated via presented multi-modal learning. Our system employs two procedures for analysis, including video content structuring and multi-modal retrieval, to automatically achieve accurate video-to-shop matching. Fashion Focus presents a unified framework that can orientate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
