Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection
Sophia Althammer, Guido Zuccon, Sebastian Hofst\"atter, Suzan, Verberne, Allan Hanbury

TL;DR
This paper evaluates active learning strategies for fine-tuning pretrained language model rankers, finding that current strategies do not outperform random selection and often incur higher costs, highlighting the need for better data selection methods.
Contribution
The study systematically compares active learning strategies with random selection for fine-tuning PLM-based rankers, revealing their limitations and the existence of more effective data subsets.
Findings
AL strategies do not significantly outperform random selection.
AL strategies often require more annotation effort and cost.
Effective data subsets exist but are not identified by current AL methods.
Abstract
Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset. We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
