On the Limits of Minimal Pairs in Contrastive Evaluation

Jannis Vamvas; Rico Sennrich

arXiv:2109.07465·cs.CL·September 16, 2021

On the Limits of Minimal Pairs in Contrastive Evaluation

Jannis Vamvas, Rico Sennrich

PDF

1 Repo

TL;DR

This paper critically examines the use of minimal pairs in contrastive evaluation of language models, emphasizing the importance of hypothesis motivation and data selection to ensure reliable insights, and proposes a new evaluation suite for machine translation.

Contribution

It highlights conditions necessary for effective contrastive evaluation and introduces a new evaluation suite based on machine-generated text for English-German MT.

Findings

01

Contrastive evaluation can produce false positives without proper hypothesis motivation.

02

Using machine-generated text for minimal pairs better approximates deployment conditions.

03

The proposed evaluation suite improves the reliability of contrastive assessments.

Abstract

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for English-German MT that implements this recommendation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zurichnlp/distil-lingeval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.