TL;DR
This paper introduces a novel multi-queue momentum contrast network for efficient bidirectional retrieval of micro-videos and products, addressing challenges posed by multi-modal and unprofessional video data.
Contribution
It presents the first exploration of multi-modal microvideo-product retrieval and proposes a new MQMC network with a discriminative multi-queue strategy for improved retrieval accuracy.
Findings
MQMC outperforms state-of-the-art baselines in experiments.
Two large-scale datasets were collected and used for evaluation.
The approach effectively handles multi-modal and unprofessional video data.
Abstract
The booming development and huge market of micro-videos bring new e-commerce channels for merchants. Currently, more micro-video publishers prefer to embed relevant ads into their micro-videos, which not only provides them with business income but helps the audiences to discover their interesting products. However, due to the micro-video recording by unprofessional equipment, involving various topics and including multiple modalities, it is challenging to locate the products related to micro-videos efficiently, appropriately, and accurately. We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances. A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval, consisting of the uni-modal feature and multi-modal instance representation learning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
