Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Ziyang Zeng, Heming Jing, Jindong Chen, Xiangli Li, Hongyu Liu, Yixuan He, Zhengyu Li, Yige Sun, Zheyong Xie, Yuqing Yang, Shaosheng Cao, Jun Fan, Yi Wu, Yao Hu

TL;DR
This paper introduces a reinforcement learning framework to improve generative relevance models in Xiaohongshu search by grounding reasoning in business-specific criteria, leading to better relevance and business outcomes.
Contribution
It proposes a novel RL-based training method with Stepwise Advantage Masking for relevance modeling, enhancing interpretability and performance in industrial search systems.
Findings
Significant improvements in relevance metrics
Enhanced robustness and interpretability
Effective model distillation for deployment
Abstract
Ranking relevance is a fundamental task in search engines, aiming to identify the items most relevant to a given user query. Traditional relevance models typically produce scalar scores or directly predict relevance labels, limiting both interpretability and the modeling of complex relevance signals. Inspired by recent advances in Chain-of-Thought (CoT) reasoning for complex tasks, we investigate whether explicit reasoning can enhance both interpretability and performance in relevance modeling. However, existing reasoning-based Generative Relevance Models (GRMs) primarily rely on supervised fine-tuning on large amounts of human-annotated or synthetic CoT data, which often leads to limited generalization. Moreover, domain-agnostic, free-form reasoning tends to be overly generic and insufficiently grounded, limiting its potential to handle the diverse and ambiguous cases prevalent in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Advanced Text Analysis Techniques · Expert finding and Q&A systems
