Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models
Xinyuan Wang, Liang Wu, Yanjie Fu

TL;DR
This paper introduces PageLLM, a reward-based fine-tuning method for large language models in Whole Page Optimization, using mixed-grained rewards to improve search and recommendation presentation based on noisy user feedback.
Contribution
It proposes a novel mixed-grained reward mechanism for fine-tuning LLMs in WPO, reducing reliance on costly manual annotations and improving real-world performance.
Findings
PageLLM achieves a 0.44% GMV increase in online A/B testing.
The mixed-grained reward mechanism improves both holistic and item-level optimization.
PageLLM outperforms baseline models on public and industrial datasets.
Abstract
Optimizing the presentation of search and recommendation results is crucial to enhancing user experience and engagement. Whole Page Optimization (WPO) plays a pivotal role in this process, as it directly influences how information is surfaced to users. While Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent and contextually relevant content, fine-tuning these models for complex tasks like WPO presents challenges. Specifically, the need for extensive human-annotated data to mitigate issues such as hallucinations and model instability can be prohibitively expensive, especially in large-scale systems that interact with millions of items daily. In this work, we address the challenge of fine-tuning LLMs for WPO by using user feedback as the supervision. Unlike manually labeled datasets, user feedback is inherently noisy and less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Multimodal Machine Learning Applications
