Spatiotemporal Graph Guided Multi-modal Network for Livestreaming   Product Retrieval

Xiaowan Hu; Yiyi Chen; Yan Li; Minquan Wang; Haoqian Wang; Quan Chen,; Han Li; Peng Jiang

arXiv:2407.16248·cs.CV·August 6, 2024

Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval

Xiaowan Hu, Yiyi Chen, Yan Li, Minquan Wang, Haoqian Wang, Quan Chen,, Han Li, Peng Jiang

PDF

1 Repo

TL;DR

This paper introduces SGMN, a novel spatiotemporal graph-based multi-modal network that improves livestreaming product retrieval by leveraging text-guided attention, long-range graph modeling, and hard example mining to address key challenges.

Contribution

The paper presents a new multi-modal network architecture that effectively handles distractors, video-image heterogeneity, and subtle product differences in livestreaming product retrieval.

Findings

01

SGMN outperforms state-of-the-art methods significantly.

02

The text-guided attention improves focus on intended products.

03

Hard example mining enhances fine-grained product discrimination.

Abstract

With the rapid expansion of e-commerce, more consumers have become accustomed to making purchases via livestreaming. Accurately identifying the products being sold by salespeople, i.e., livestreaming product retrieval (LPR), poses a fundamental and daunting challenge. The LPR task encompasses three primary dilemmas in real-world scenarios: 1) the recognition of intended products from distractor products present in the background; 2) the video-image heterogeneity that the appearance of products showcased in live streams often deviates substantially from standardized product images in stores; 3) there are numerous confusing products with subtle visual nuances in the shop. To tackle these challenges, we propose the Spatiotemporal Graphing Multi-modal Network (SGMN). First, we employ a text-guided attention mechanism that leverages the spoken content of salespeople to guide the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huxiaowan/sgmn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus