SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian, Michael, Felix Hill, Omer Levy, Samuel R. Bowman

TL;DR
SuperGLUE is a new, more challenging benchmark for evaluating general-purpose language understanding systems, featuring harder tasks, a software toolkit, and a public leaderboard to advance research beyond current models.
Contribution
It introduces SuperGLUE, a benchmark with more difficult tasks and tools to push the limits of language understanding models beyond existing performance levels.
Findings
Performance on GLUE surpasses non-expert human levels.
SuperGLUE provides a more challenging set of tasks.
Benchmark and toolkit are publicly available.
Abstract
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗C5i/SEAD-L-6_H-256_A-8-sst2model· 6 dl6 dl
- 🤗C5i/SEAD-L-6_H-384_A-12-sst2model· 10 dl10 dl
- 🤗C5i/SEAD-L-6_H-384_A-12-mrpcmodel· 3 dl3 dl
- 🤗C5i/SEAD-L-6_H-256_A-8-mrpcmodel· 2 dl2 dl
- 🤗C5i/SEAD-L-6_H-256_A-8-rtemodel· 2 dl2 dl
- 🤗C5i/SEAD-L-6_H-384_A-12-rtemodel· 3 dl3 dl
- 🤗C5i/SEAD-L-6_H-256_A-8-stsbmodel· 1 dl1 dl
- 🤗C5i/SEAD-L-6_H-384_A-12-stsbmodel· 2 dl2 dl
- 🤗C5i/SEAD-L-6_H-256_A-8-qnlimodel· 6 dl6 dl
- 🤗C5i/SEAD-L-6_H-384_A-12-qnlimodel· 5 dl5 dl
Videos
AIs Are Getting Too Smart - Time For A New "IQ Test” 🎓· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
