Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit   for Purpose?

Lu\'isa Shimabucoro; Timothy Hospedales; Henry Gouk

arXiv:2307.02732·cs.LG·July 7, 2023

Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?

Lu\'isa Shimabucoro, Timothy Hospedales, Henry Gouk

PDF

Open Access

TL;DR

This paper critically examines the effectiveness of current few-shot learning benchmarks at the task level, revealing their limitations in reliably evaluating and tuning models for individual tasks.

Contribution

It introduces the first comprehensive analysis of task-level evaluation methods in few-shot learning, highlighting the shortcomings of existing benchmarks and proposing better evaluation strategies.

Findings

01

Cross-validation with few folds best for performance estimation.

02

Bootstrapping and many-fold cross-validation better for model selection.

03

Existing benchmarks do not reliably evaluate individual task performance.

Abstract

Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications

MethodsFocus