Exploring the Viability of Synthetic Query Generation for Relevance Prediction
Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Kazuma Hashimoto,, Mike Bendersky, Marc Najork

TL;DR
This paper critically evaluates the use of synthetic query generation for nuanced relevance prediction in information retrieval, revealing current methods' limitations and proposing label-conditioned models to improve performance.
Contribution
It provides a comprehensive empirical analysis of QGen approaches for fine-grained relevance prediction and introduces label-conditioned QGen models to address identified shortcomings.
Findings
QGen approaches underperform compared to transfer learning methods.
Existing QGen models struggle to differentiate relevance levels.
Label-conditioned QGen improves relevance modeling but still faces challenges.
Abstract
Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address this paucity, recent methods leverage these powerful models to generate high-quality task and domain-specific synthetic data. Prior work has largely explored synthetic data generation or query generation (QGen) for Question-Answering (QA) and binary (yes/no) relevance prediction, where for instance, the QGen models are given a document, and trained to generate a query relevant to that document. However in many problems, we have a more fine-grained notion of relevance than a simple yes/no label.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Recommender Systems and Techniques
