Reinforcement Learning to Rank Using Coarse-grained Rewards
Yiteng Tu, Zhichao Xu, Tao Yang, Weihang Su, Yujia Zhou, Yiqun Liu, Fen Lin, Qin Liu, Qingyao Ai

TL;DR
This paper explores reinforcement learning for ranking tasks using coarse-grained feedback signals, demonstrating that RL can outperform traditional supervised methods even with less detailed rewards.
Contribution
It introduces new RL-based methods tailored for coarse-grained rewards in ranking and systematically compares them with supervised approaches on large-scale benchmarks.
Findings
RL methods outperform supervised baselines with coarse rewards
RL can effectively optimize ranking without fine-grained labels
Large-scale experiments validate RL's potential in IR tasks
Abstract
Learning to rank (LTR) plays a crucial role in various Information Retrieval (IR) tasks. Although supervised LTR methods based on fine-grained relevance labels (e.g., document-level annotations) have achieved significant success, their reliance on costly and potentially biased annotations limits scalability and alignment with realistic goals. In contrast, coarse-grained feedback signals, such as duration time and session-level engagement, are more accessible and affordable. Reinforcement Learning (RL) offers a promising framework to directly optimize these objectives using reward signals, but most existing Reinforcement Learning to Rank (RLTR) approaches suffer from high variance and low sample efficiency. Motivated by recent advances in large language models (LLMs), we re-examine the problem of RLTR with coarse-grained rewards and propose new RLTR methods based on widely used RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing
