SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation
Kaiming Shen, Xichen Ding, Zixiang Zheng, Yuqi Gong, Qianqian Li,, Zhongyi Liu, Guannan Zhang

TL;DR
This paper introduces SEMINAR, a unified model for lifelong multi-modal sequence recommendation that improves user interest modeling, aligns multi-modal embeddings, and accelerates retrieval with approximate methods.
Contribution
The paper proposes SEMINAR, a novel lifelong multi-modal sequence model with a pretraining-finetuning framework and a codebook-based retrieval strategy for recommendation systems.
Findings
Effective multi-modal alignment achieved
Improved lifelong sequence modeling performance
Fast approximate retrieval with codebook strategy
Abstract
The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personalized search ranking (PSR) is extremely difficult due to the insufficient learning problem of ID embedding, especially when the IDs in the lifelong sequence features do not exist in the samples of training dataset. Additionally, existing target attention mechanisms struggle to learn the multi-modal representations of items in the sequence well. The distribution of multi-modal embedding (text, image and attributes) output of user's interacted items are not properly aligned and there exist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Machine Learning in Healthcare
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
