Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution
Amrith Setlur, Oscar Li, Virginia Smith

TL;DR
This paper clarifies the distinction between in-distribution and out-of-distribution evaluations in meta-learning, highlighting discrepancies in benchmarks and proposing guidelines for more reliable and comprehensive assessments.
Contribution
It identifies the mismatch between theoretical ID settings and practical OOD benchmarks, and offers recommendations for constructing better meta-learning evaluation protocols.
Findings
Most benchmarks reflect OOD evaluation, not ID.
Performance varies significantly between ID and OOD settings.
Current OOD benchmarks pose challenges for model selection and comparison.
Abstract
We categorize meta-learning evaluation into two settings: [ID], in which the train and test tasks are sampled from the same underlying task distribution, and [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification benchmarks instead reflect OOD evaluation, as they use disjoint sets of train (base) and test (novel) classes for task generation. This discrepancy is problematic because -- as we show on numerous benchmarks -- meta-learning methods that perform better on existing OOD datasets may perform significantly worse in the ID setting. In addition, in the OOD setting, even though current FSL benchmarks seem befitting, our study highlights concerns in 1) reliably performing model selection for a given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Multimodal Machine Learning Applications
