Augmented Relevance Datasets with Fine-Tuned Small LLMs

Quentin Fitte-Rey; Matyas Amrouche; Romain Deveaud

arXiv:2504.09816·cs.IR·April 15, 2025·2 cites

Augmented Relevance Datasets with Fine-Tuned Small LLMs

Quentin Fitte-Rey, Matyas Amrouche, Romain Deveaud

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuned small LLMs can effectively automate relevance assessment, improving dataset quality and ranking model performance in search systems, offering a scalable solution for dataset augmentation.

Contribution

The paper introduces a method for fine-tuning small LLMs to automate relevance labeling, enhancing dataset quality for ranking models, which is a novel approach in search engine optimization.

Findings

01

Fine-tuned small LLMs outperform some closed source models on relevance tasks.

02

Augmented datasets with small LLMs improve ranking model performance.

03

Small LLMs offer a scalable, resource-efficient alternative for dataset creation.

Abstract

Building high-quality datasets and labeling query-document relevance are essential yet resource-intensive tasks, requiring detailed guidelines and substantial effort from human annotators. This paper explores the use of small, fine-tuned large language models (LLMs) to automate relevance assessment, with a focus on improving ranking models' performance by augmenting their training dataset. We fine-tuned small LLMs to enhance relevance assessments, thereby improving dataset creation quality for downstream ranking model training. Our experiments demonstrate that these fine-tuned small LLMs not only outperform certain closed source models on our dataset but also lead to substantial improvements in ranking model performance. These results highlight the potential of leveraging small LLMs for efficient and scalable dataset augmentation, providing a practical solution for search engine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning and Data Classification

MethodsFocus