LEMUR: Large scale End-to-end MUltimodal Recommendation
Xintian Han, Honggang Chen, Quan Lin, Jingyue Gao, Xiangyuan Ren, Lifei Zhu, Zhisheng Ye, Shikang Wu, XiongHang Xie, Xiaochu Gan, Bingzheng Wei, Peng Xu, Zhe Wang, Yuchao Zheng, Jingjian Lin, Di Wu, Junfeng Ge

TL;DR
LEMUR is a large-scale, end-to-end multimodal recommender system that jointly optimizes multimodal and recommendation tasks, improving alignment, adaptability, and efficiency in real-world industrial applications.
Contribution
LEMUR introduces the first end-to-end training framework for large-scale multimodal recommendation, integrating multimodal learning with recommendation objectives and proposing a memory bank for efficient representation accumulation.
Findings
Reduced query change rate decay by 0.843% in Douyin Search.
Achieved 0.81% improvement in QAUC for Douyin Search.
Demonstrated significant offline metric gains in Douyin Advertisement.
Abstract
Traditional ID-based recommender systems often struggle with cold-start and generalization challenges. Multimodal recommendation systems, which leverage textual and visual data, offer a promising solution to mitigate these issues. However, existing industrial approaches typically adopt a two-stage training paradigm: first pretraining a multimodal model, then applying its frozen representations to train the recommendation model. This decoupled framework suffers from misalignment between multimodal learning and recommendation objectives, as well as an inability to adapt dynamically to new data. To address these limitations, we propose LEMUR, the first large-scale multimodal recommender system trained end-to-end from raw data. By jointly optimizing both the multimodal and recommendation components, LEMUR ensures tighter alignment with downstream objectives while enabling real-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
