Can Generative LLMs Create Query Variants for Test Collections? An   Exploratory Study

Marwah Alaofi; Luke Gallagher; Mark Sanderson; Falk Scholer; Paul; Thomas

arXiv:2501.17981·cs.IR·January 31, 2025

Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study

Marwah Alaofi, Luke Gallagher, Mark Sanderson, Falk Scholer, Paul, Thomas

PDF

1 Repo

TL;DR

This study investigates the potential of large language models to automatically generate query variants for test collections, comparing their effectiveness to human-generated queries in terms of document overlap and utility.

Contribution

It provides an exploratory analysis of LLM-generated query variants, highlighting their potential and limitations in supporting test collection development.

Findings

01

LLMs can generate query variants with up to 71.1% overlap in relevant documents.

02

Generated queries contribute meaningfully to document pooling.

03

LLMs do not fully replicate the diversity of human query variants.

Abstract

This paper explores the utility of a Large Language Model (LLM) to automatically generate queries and query variants from a description of an information need. Given a set of information needs described as backstories, we explore how similar the queries generated by the LLM are to those generated by humans. We quantify the similarity using different metrics and examine how the use of each set would contribute to document pooling when building test collections. Our results show potential in using LLMs to generate query variants. While they may not fully capture the wide variety of human-generated variants, they generate similar sets of relevant documents, reaching up to 71.1% overlap at a pool depth of 100.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MarwahAlaofi/SIGIR-23-SRP-UQV100-GPT-Query-Variants
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training