Pre-training Graph Transformer with Multimodal Side Information for Recommendation
Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong, Zhang, Aixin Sun, Chunyan Miao

TL;DR
This paper introduces a pre-training method for graph transformers that leverages multimodal item side information and user activity-based relations to improve recommendation accuracy and other downstream tasks.
Contribution
It proposes a novel pre-training strategy using a homogeneous item graph with multimodal data and a new sampling algorithm, MCNSampling, for enhanced item representation learning.
Findings
PMGT outperforms baseline models in recommendation accuracy
Effective utilization of multimodal side information improves downstream task performance
Case study confirms scalability and real-world applicability
Abstract
Side information of items, e.g., images and text description, has shown to be effective in contributing to accurate recommendations. Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We relate items by common user activities, e.g., co-purchase, and construct a homogeneous item graph. This graph provides a unified view of item relations and their associated side information in multimodality. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction. Experimental results on real datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Dropout · Byte Pair Encoding · Dense Connections · Label Smoothing · Attention Is All You Need · Multi-Head Attention
