GenTREC: The First Test Collection Generated by Large Language Models for Evaluating Information Retrieval Systems
Mehmet Deniz T\"urkmen, Mucahid Kutlu, Bahadir Altun, Gokalp Cosgun

TL;DR
GenTREC introduces a novel, low-cost test collection for evaluating information retrieval systems, generated entirely by large language models, reducing reliance on manual relevance judgments while maintaining evaluation reliability.
Contribution
This paper presents the first IR test collection created solely from LLM-generated documents, demonstrating its effectiveness and compatibility with traditional collections for system evaluation.
Findings
GenTREC's IR system rankings align with traditional collections for key metrics.
The collection contains nearly 97,000 documents and 19,000 relevance judgments.
The approach significantly reduces resource requirements for IR evaluation.
Abstract
Building test collections for Information Retrieval evaluation has traditionally been a resource-intensive and time-consuming task, primarily due to the dependence on manual relevance judgments. While various cost-effective strategies have been explored, the development of such collections remains a significant challenge. In this paper, we present GenTREC , the first test collection constructed entirely from documents generated by a Large Language Model (LLM), eliminating the need for manual relevance judgments. Our approach is based on the assumption that documents generated by an LLM are inherently relevant to the prompts used for their generation. Based on this heuristic, we utilized existing TREC search topics to generate documents. We consider a document relevant only to the prompt that generated it, while other document-topic pairs are treated as non-relevant. To introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Data Quality and Management
MethodsSparse Evolutionary Training
