A Metric for MLLM Alignment in Large-scale Recommendation

Yubin Zhang; Yanhua Huang; Haiming Xu; Mingliang Qi; Chang Wang; Jiarui Jin; Xiangyuan Ren; Xiaodan Wang; Ruiwen Xu

arXiv:2508.04963·cs.IR·August 8, 2025

A Metric for MLLM Alignment in Large-scale Recommendation

Yubin Zhang, Yanhua Huang, Haiming Xu, Mingliang Qi, Chang Wang, Jiarui Jin, Xiangyuan Ren, Xiaodan Wang, Ruiwen Xu

PDF

TL;DR

This paper introduces the Leakage Impact Score (LIS), a new metric to evaluate the alignment of multimodal large language models in recommendation systems, addressing challenges of dynamic environments and costly online testing.

Contribution

The paper proposes LIS, a scalable metric for MLLM alignment in recommendation, validated through online A/B tests on real-world platforms, improving evaluation accuracy and practical deployment.

Findings

01

LIS effectively measures the upper bound of preference data.

02

Online A/B tests show significant improvements in user engagement and advertiser value.

03

LIS provides actionable insights for deploying MLLMs in real-world recommendation systems.

Abstract

Multimodal recommendation has emerged as a critical technique in modern recommender systems, leveraging content representations from advanced multimodal large language models (MLLMs). To ensure these representations are well-adapted, alignment with the recommender system is essential. However, evaluating the alignment of MLLMs for recommendation presents significant challenges due to three key issues: (1) static benchmarks are inaccurate because of the dynamism in real-world applications, (2) evaluations with online system, while accurate, are prohibitively expensive at scale, and (3) conventional metrics fail to provide actionable insights when learned representations underperform. To address these challenges, we propose the Leakage Impact Score (LIS), a novel metric for multimodal recommendation. Rather than directly assessing MLLMs, LIS efficiently measures the upper bound of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.