Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?
Lu\'isa Shimabucoro, Timothy Hospedales, Henry Gouk

TL;DR
This paper critically examines the effectiveness of current few-shot learning benchmarks at the task level, revealing their limitations in reliably evaluating and tuning models for individual tasks.
Contribution
It introduces the first comprehensive analysis of task-level evaluation methods in few-shot learning, highlighting the shortcomings of existing benchmarks and proposing better evaluation strategies.
Findings
Cross-validation with few folds best for performance estimation.
Bootstrapping and many-fold cross-validation better for model selection.
Existing benchmarks do not reliably evaluate individual task performance.
Abstract
Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
MethodsFocus
