CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini,, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

TL;DR
This paper introduces CLUES, a standardized benchmark for evaluating the few-shot learning capabilities of natural language understanding models, highlighting performance gaps and differences among models in low-data scenarios.
Contribution
The paper presents CLUES, a new benchmark for consistent evaluation of few-shot NLU, and discusses principles for standardized experimental settings.
Findings
Recent models reach human performance with large data
Significant performance gap exists in few-shot settings
Differences among models and techniques are evident in low-data scenarios
Abstract
Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. In fact, many NLU models have now matched or exceeded "human-level" performance on many tasks in these benchmarks. Most of these benchmarks, however, give models access to relatively large amounts of labeled data for training. As such, the models are provided far more data than required by humans to achieve strong performance. That has motivated a line of work that focuses on improving few-shot learning performance of NLU models. However, there is a lack of standardized evaluation benchmarks for few-shot NLU resulting in different experimental settings in different papers. To help accelerate this line of work, we introduce CLUES (Constrained Language Understanding Evaluation Standard), a benchmark for evaluating the few-shot learning capabilities of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
