CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Subhabrata Mukherjee; Xiaodong Liu; Guoqing Zheng; Saghar Hosseini,; Hao Cheng; Greg Yang; Christopher Meek; Ahmed Hassan Awadallah; Jianfeng Gao

arXiv:2111.02570·cs.CL·November 5, 2021·6 cites

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini,, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLUES, a standardized benchmark for evaluating the few-shot learning capabilities of natural language understanding models, highlighting performance gaps and differences among models in low-data scenarios.

Contribution

The paper presents CLUES, a new benchmark for consistent evaluation of few-shot NLU, and discusses principles for standardized experimental settings.

Findings

01

Recent models reach human performance with large data

02

Significant performance gap exists in few-shot settings

03

Differences among models and techniques are evident in low-data scenarios

Abstract

Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. In fact, many NLU models have now matched or exceeded "human-level" performance on many tasks in these benchmarks. Most of these benchmarks, however, give models access to relatively large amounts of labeled data for training. As such, the models are provided far more data than required by humans to achieve strong performance. That has motivated a line of work that focuses on improving few-shot learning performance of NLU models. However, there is a lack of standardized evaluation benchmarks for few-shot NLU resulting in different experimental settings in different papers. To help accelerate this line of work, we introduce CLUES (Constrained Language Understanding Evaluation Standard), a benchmark for evaluating the few-shot learning capabilities of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/clues
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications