Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation
Wei Ji, Xiangyan Liu, An Zhang, Yinwei Wei, Yongxin Ni, Xiang Wang

TL;DR
This paper introduces ODMT, a novel multi-modal transformer framework that enhances sequential recommendation by integrating diverse data sources through online distillation and an ID-aware module, leading to significant accuracy improvements.
Contribution
It proposes a model-agnostic framework combining ID-aware multi-modal transformer and online distillation for improved multi-modal sequential recommendation.
Findings
Approximately 10% performance improvement over baselines
Effective multi-source feature interaction and mutual learning
Enhanced robustness in recommendation predictions
Abstract
Multi-modal recommendation systems, which integrate diverse types of information, have gained widespread attention in recent years. However, compared to traditional collaborative filtering-based multi-modal recommendation systems, research on multi-modal sequential recommendation is still in its nascent stages. Unlike traditional sequential recommendation models that solely rely on item identifier (ID) information and focus on network structure design, multi-modal recommendation models need to emphasize item representation learning and the fusion of heterogeneous data sources. This paper investigates the impact of item representation learning on downstream recommendation tasks and examines the disparities in information fusion at different stages. Empirical experiments are conducted to demonstrate the need to design a framework suitable for collaborative learning and fusion of diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding
