On Measuring the Intrinsic Few-Shot Hardness of Datasets

Xinran Zhao; Shikhar Murty; Christopher D. Manning

arXiv:2211.09113·cs.CL·November 17, 2022

On Measuring the Intrinsic Few-Shot Hardness of Datasets

Xinran Zhao, Shikhar Murty, Christopher D. Manning

PDF

Open Access 1 Repo

TL;DR

This paper investigates the intrinsic difficulty of datasets for few-shot learning in NLP, proposing a new metric called 'Spread' that predicts dataset hardness efficiently and independently of specific adaptation methods.

Contribution

It introduces the 'Spread' metric to measure dataset intrinsic few-shot hardness, showing its effectiveness and efficiency over existing measures.

Findings

01

Few-shot learning performance is highly correlated across methods, indicating dataset hardness is intrinsic.

02

The 'Spread' metric better predicts dataset hardness than existing measures.

03

'Spread' is 8-100x faster to compute than previous hardness metrics.

Abstract

While advances in pre-training have led to dramatic improvements in few-shot learning of NLP tasks, there is limited understanding of what drives successful few-shot adaptation in datasets. In particular, given a new dataset and a pre-trained model, what properties of the dataset make it \emph{few-shot learnable} and are these properties independent of the specific adaptation techniques used? We consider an extensive set of recent few-shot learning methods, and show that their performance across a large number of datasets is highly correlated, showing that few-shot hardness may be intrinsic to datasets, for a given pre-trained model. To estimate intrinsic few-shot hardness, we then propose a simple and lightweight metric called "Spread" that captures the intuition that few-shot learning is made possible by exploiting feature-space invariances between training and test samples. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

colinzhaoust/intrinsic_fewshot_hardness
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Topic Modeling

MethodsTest