Unbiased Multimodal Reranking for Long-Tail Short-Video Search
Wenyi Xu, Feiran Zhu, Songyang Li, Renzhe Zhou, Chao Zhang, Chenglei Dai, Yuren Mao, Yunjun Gao, and Yi Zhang

TL;DR
This paper introduces an LLM-driven multimodal reranking framework that improves long-tail short-video search quality by estimating user experience without relying on user behavior data, enhancing ranking fairness and relevance.
Contribution
The paper presents a novel two-stage training method leveraging multimodal evidence and preference optimization, enabling effective reranking without real user interaction data.
Findings
Achieves consistent offline improvements in AUC, NDCG@K, and human preferences.
Online A/B testing shows significant gains in user experience and consumption metrics.
Effectively promotes high-quality, underexposed videos in long-tail search scenarios.
Abstract
Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge provides a powerful mechanism to assess content quality, agnostic to sparse user interactions. To this end, we propose a LLM-driven multimodal reranking framework, which estimates user experience without real user behavior. The approach involves a two-stage training process: the first stage uses multimodal evidence to construct high-quality annotations for supervised fine-tuning, while the second stage incorporates pairwise preference optimization to help the model learn partial orderings among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
