Towards Bridging the Cross-modal Semantic Gap for Multi-modal   Recommendation

Xinglong Wu; Anfeng Huang; Hongwei Yang; Hui He; Yu Tai; Weizhe Zhang

arXiv:2407.05420·cs.IR·July 9, 2024·1 cites

Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation

Xinglong Wu, Anfeng Huang, Hongwei Yang, Hui He, Yu Tai, Weizhe Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLIPER, a multi-modal recommendation framework that leverages cross-modal alignment to bridge semantic gaps and improve recommendation accuracy by extracting fine-grained semantic information.

Contribution

The paper proposes a novel multi-view modality-alignment approach using CLIP to better capture cross-modal semantics for recommendation tasks.

Findings

01

CLIPER outperforms existing models on three public datasets.

02

The multi-view alignment improves semantic representation quality.

03

The approach effectively bridges the semantic gap across modalities.

Abstract

Multi-modal recommendation greatly enhances the performance of recommender systems by modeling the auxiliary information from multi-modality contents. Most existing multi-modal recommendation models primarily exploit multimedia information propagation processes to enrich item representations and directly utilize modal-specific embedding vectors independently obtained from upstream pre-trained models. However, this might be inappropriate since the abundant task-specific semantics remain unexplored, and the cross-modality semantic gap hinders the recommendation performance. Inspired by the recent progress of the cross-modal alignment model CLIP, in this paper, we propose a novel \textbf{CLIP} \textbf{E}nhanced \textbf{R}ecommender (\textbf{CLIPER}) framework to bridge the semantic gap between modalities and extract fine-grained multi-view semantic information. Specifically, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WuXinglong-HIT/CLIPER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training