Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization
Jo\~ao Coelho, Bruno Martins, Jo\~ao Magalh\~aes, Chenyan Xiong

TL;DR
This paper introduces a novel framework using Direct Preference Optimization to improve synthetic web query generation, aligning it more closely with ranking objectives and enhancing retrieval performance.
Contribution
It proposes a new DPO-based method that directly incorporates ranking signals into query generation, improving the quality of synthetic queries for neural retrieval models.
Findings
Higher relevance scores after DPO training
Improved downstream retrieval performance on MS MARCO
Outperforms baseline models trained with synthetic data
Abstract
Neural retrieval models excel in Web search, but their training requires substantial amounts of labeled query-document pairs, which are costly to obtain. With the widespread availability of Web document collections like ClueWeb22, synthetic queries generated by large language models offer a scalable alternative. Still, synthetic training queries often vary in quality, which leads to suboptimal downstream retrieval performance. Existing methods typically filter out noisy query-document pairs based on signals from an external re-ranker. In contrast, we propose a framework that leverages Direct Preference Optimization (DPO) to integrate ranking signals into the query generation process, aiming to directly optimize the model towards generating high-quality queries that maximize downstream retrieval effectiveness. Experiments show higher ranker-assessed relevance between query-document pairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Web Data Mining and Analysis · Advanced Database Systems and Queries
MethodsDirect Preference Optimization
