A Metric for MLLM Alignment in Large-scale Recommendation
Yubin Zhang, Yanhua Huang, Haiming Xu, Mingliang Qi, Chang Wang, Jiarui Jin, Xiangyuan Ren, Xiaodan Wang, Ruiwen Xu

TL;DR
This paper introduces the Leakage Impact Score (LIS), a new metric to evaluate the alignment of multimodal large language models in recommendation systems, addressing challenges of dynamic environments and costly online testing.
Contribution
The paper proposes LIS, a scalable metric for MLLM alignment in recommendation, validated through online A/B tests on real-world platforms, improving evaluation accuracy and practical deployment.
Findings
LIS effectively measures the upper bound of preference data.
Online A/B tests show significant improvements in user engagement and advertiser value.
LIS provides actionable insights for deploying MLLMs in real-world recommendation systems.
Abstract
Multimodal recommendation has emerged as a critical technique in modern recommender systems, leveraging content representations from advanced multimodal large language models (MLLMs). To ensure these representations are well-adapted, alignment with the recommender system is essential. However, evaluating the alignment of MLLMs for recommendation presents significant challenges due to three key issues: (1) static benchmarks are inaccurate because of the dynamism in real-world applications, (2) evaluations with online system, while accurate, are prohibitively expensive at scale, and (3) conventional metrics fail to provide actionable insights when learned representations underperform. To address these challenges, we propose the Leakage Impact Score (LIS), a novel metric for multimodal recommendation. Rather than directly assessing MLLMs, LIS efficiently measures the upper bound of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
