Chasing Random: Instruction Selection Strategies Fail to Generalize
Harshita Diddee, Daphne Ippolito

TL;DR
This paper critically examines instruction selection strategies for language models, revealing that they often fail to generalize well and may not be cost-effective compared to using full datasets or random subsets.
Contribution
It provides a comprehensive analysis of popular instruction selection methods across various datasets and benchmarks, highlighting their limited generalization and cost-effectiveness.
Findings
Selection strategies often do not outperform random baselines.
Data selection can be more costly than fine-tuning on full datasets.
Limited gains from data selection strategies in many scenarios.
Abstract
Prior work has shown that language models can be tuned to follow user instructions using only a small set of high-quality instructions. This has accelerated the development of methods that filter a large, noisy instruction-tuning datasets down to high-quality subset which works just as well. However, typically, the performance of these methods is not demonstrated across a uniform experimental setup and thus their generalization capabilities are not well established. In this work, we analyze popular selection strategies across different source datasets, selection budgets and evaluation benchmarks: Our results indicate that selection strategies generalize poorly, often failing to consistently outperform even random baselines. We also analyze the cost-performance trade-offs of using data selection. Our findings reveal that data selection can often exceed the cost of fine-tuning on the full…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Assessment and Pedagogy · Online and Blended Learning
MethodsSparse Evolutionary Training
