Multimodal Enhancement of Sequential Recommendation
Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield

TL;DR
This paper introduces MuSTRec, a multimodal and sequential recommendation framework that unifies different paradigms, leveraging cross-item similarities and user preferences to significantly improve recommendation accuracy across various datasets.
Contribution
MuSTRec is the first framework to unify multimodal and sequential recommendation paradigms using a transformer-based approach with novel data partitioning and user embedding integration.
Findings
MuSTRec outperforms state-of-the-art baselines by up to 33.5% on Amazon datasets.
Integrating user embeddings can increase short-term metrics by up to 200%.
A new data partitioning regime is necessary for multimodal sequential recommendation.
Abstract
We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
