Cross-view Semantic Alignment for Livestreaming Product Recognition

Wenjie Yang; Yiyi Chen; Yan Li; Yanhua Cheng; Xudong Liu; Quan Chen,; Han Li

arXiv:2308.04912·cs.CV·August 22, 2023

Cross-view Semantic Alignment for Livestreaming Product Recognition

Wenjie Yang, Yiyi Chen, Yan Li, Yanhua Cheng, Xudong Liu, Quan Chen,, Han Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces LPR4M, a large-scale multimodal dataset for livestreaming product recognition, and proposes RICE, a model that uses contrastive learning and patch-level feature propagation to improve cross-view semantic alignment.

Contribution

The paper presents a new multimodal dataset LPR4M and a novel RICE model that enhances cross-view semantic alignment for livestreaming product recognition.

Findings

01

RICE outperforms existing methods in recognition accuracy.

02

LPR4M dataset covers diverse categories and modalities, reflecting real-world scenarios.

03

Patch Feature Reconstruction effectively reduces semantic misalignment.

Abstract

Live commerce is the act of selling products online through live streaming. The customer's diverse demands for online products introduce more challenges to Livestreaming Product Recognition. Previous works have primarily focused on fashion clothing data or utilize single-modal input, which does not reflect the real-world scenario where multimodal data from various categories are present. In this paper, we present LPR4M, a large-scale multimodal dataset that covers 34 categories, comprises 3 modalities (image, video, and text), and is 50x larger than the largest publicly available dataset. LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution, resembling real-world problems. Moreover, a cRoss-vIew semantiC alignmEnt (RICE) model is proposed to learn discriminative instance features from the image and video views of the products. This is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adxcreative/rice
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Face recognition and analysis

MethodsContrastive Learning