Benchmarking Long-tail Generalization with Likelihood Splits

Ameya Godbole; Robin Jia

arXiv:2210.06799·cs.CL·May 3, 2023

Benchmarking Long-tail Generalization with Likelihood Splits

Ameya Godbole, Robin Jia

PDF

Open Access 1 Repo

TL;DR

This paper introduces Likelihood Splits, a method for creating challenging NLP benchmarks by splitting datasets based on the likelihood assigned by a pre-trained language model, revealing greater model difficulties and ensuring fairness.

Contribution

The paper proposes a novel dataset splitting method that enhances the difficulty and fairness of NLP benchmarks by leveraging likelihood scores from language models.

Findings

01

Likelihood Splits increase error rates of state-of-the-art models significantly.

02

Likelihood Splits surface more challenging examples than random splits.

03

The method creates fairer benchmarks compared to adversarial filtering.

Abstract

In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances. We propose a method to create challenging benchmarks that require generalizing to the tail of the distribution by re-splitting existing datasets. We create 'Likelihood Splits' where examples that are assigned lower likelihood by a pre-trained language model (LM) are placed in the test set, and more likely examples are in the training set. This simple approach can be customized to construct meaningful train-test splits for a wide range of tasks. Likelihood Splits surface more challenges than random splits: relative error rates of state-of-the-art models increase by 59% for semantic parsing on Spider, 93% for natural language inference on SNLI, and 33% for yes/no question answering on BoolQ, on our splits compared with the corresponding random splits. Moreover, Likelihood Splits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ameyagodbole/long-tail-likelihood-splits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsTest