Augmented Relevance Datasets with Fine-Tuned Small LLMs
Quentin Fitte-Rey, Matyas Amrouche, Romain Deveaud

TL;DR
This paper demonstrates that fine-tuned small LLMs can effectively automate relevance assessment, improving dataset quality and ranking model performance in search systems, offering a scalable solution for dataset augmentation.
Contribution
The paper introduces a method for fine-tuning small LLMs to automate relevance labeling, enhancing dataset quality for ranking models, which is a novel approach in search engine optimization.
Findings
Fine-tuned small LLMs outperform some closed source models on relevance tasks.
Augmented datasets with small LLMs improve ranking model performance.
Small LLMs offer a scalable, resource-efficient alternative for dataset creation.
Abstract
Building high-quality datasets and labeling query-document relevance are essential yet resource-intensive tasks, requiring detailed guidelines and substantial effort from human annotators. This paper explores the use of small, fine-tuned large language models (LLMs) to automate relevance assessment, with a focus on improving ranking models' performance by augmenting their training dataset. We fine-tuned small LLMs to enhance relevance assessments, thereby improving dataset creation quality for downstream ranking model training. Our experiments demonstrate that these fine-tuned small LLMs not only outperform certain closed source models on our dataset but also lead to substantial improvements in ranking model performance. These results highlight the potential of leveraging small LLMs for efficient and scalable dataset augmentation, providing a practical solution for search engine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Data Classification
MethodsFocus
