Policy-Gradient Training of Language Models for Ranking
Ge Gao, Jonathan D. Chang, Claire Cardie, Kiant\'e Brantley, Thorsten, Joachim

TL;DR
This paper introduces Neural PG-RANK, a policy gradient-based training method for language model retrievers that directly optimizes ranking quality, reducing reliance on heuristics and improving performance in text retrieval tasks.
Contribution
Neural PG-RANK is a novel end-to-end training algorithm that models ranking as a Plackett-Luce policy, aligning training objectives with downstream decision quality.
Findings
Significant in-domain performance improvements.
Enhanced out-of-domain generalization.
Effective unification of training and decision metrics.
Abstract
Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires intricate heuristics, including selecting hard negatives and using additional supervision as learning signals. This reliance on heuristics stems from the fact that the contrastive loss itself is heuristic and does not directly optimize the downstream metrics of decision quality at the end of the processing pipeline. To address this issue, we introduce Neural PG-RANK, a novel training algorithm that learns to rank by instantiating a LLM as a Plackett-Luce ranking policy. Neural PG-RANK provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Multimodal Machine Learning Applications
