A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation
Zixuan Yi, Iadh Ounis

TL;DR
This paper introduces UGT, a unified graph transformer that jointly extracts and fuses multi-modal features for improved recommendation accuracy in multimedia e-commerce platforms.
Contribution
The paper proposes a novel unified graph transformer model that integrates feature extraction and modality fusion into a single framework for multi-modal recommendation.
Findings
UGT outperforms existing models in recommendation accuracy.
Joint optimization enhances multi-modal feature integration.
Significant improvements in multi-modal recommendation tasks.
Abstract
With the rapid development of online multimedia services, especially in e-commerce platforms, there is a pressing need for personalised recommendation systems that can effectively encode the diverse multi-modal content associated with each item. However, we argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality modelling. Such isolated processes can harm the recommendation performance. Firstly, an isolated extraction process underestimates the importance of effective feature extraction in multi-modal recommendations, potentially incorporating non-relevant information, which is harmful to item representations. Second, an isolated modality modelling process produces disjointed embeddings for item modalities due to the individual processing of each modality, which leads to a suboptimal fusion of user/item…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks
MethodsAttention Is All You Need · Laplacian EigenMap · Label Smoothing · Laplacian Positional Encodings · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer
