Can LLM Annotations Replace User Clicks for Learning to Rank?
Lulu Yu, Keping Bi, Jiafeng Guo, Shihao Liu, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng

TL;DR
This paper compares large language model (LLM) annotations and click data for training ranking models, finding that each excels in different query frequency ranges and proposing combined training strategies to leverage both signals.
Contribution
It provides a comprehensive analysis of LLM annotations versus click data for learning to rank, and introduces methods to integrate both for improved performance.
Findings
Click data outperforms LLM annotations on high-frequency queries.
LLM annotations are more effective on medium- and low-frequency queries.
Combined training strategies improve ranking across all query frequencies.
Abstract
Large-scale supervised data is essential for training modern ranking models, but obtaining high-quality human annotations is costly. Click data has been widely used as a low-cost alternative, and with recent advances in large language models (LLMs), LLM-based relevance annotation has emerged as another promising annotation. This paper investigates whether LLM annotations can replace click data for learning to rank (LTR) by conducting a comprehensive comparison across multiple dimensions. Experiments on both a public dataset, TianGong-ST, and an industrial dataset, Baidu-Click, show that click-supervised models perform better on high-frequency queries, while LLM annotation-supervised models are more effective on medium- and low-frequency queries. Further analysis shows that click-supervised models are better at capturing document-level signals such as authority or quality, while LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Expert finding and Q&A systems · Data Quality and Management
