A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Swaroop Mishra; Anjana Arunkumar; Chris Bryan; Chitta Baral

arXiv:2210.07566·cs.CL·October 17, 2022·1 cites

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

PDF

Open Access

TL;DR

This paper surveys parameters related to bias in NLP benchmarks, aiming to develop a quality metric by analyzing bias properties across datasets and tasks, highlighting the need for better benchmark evaluation.

Contribution

It introduces a comprehensive survey of bias-related parameters in NLP benchmarks and proposes initial parameters to quantify benchmark quality, addressing current limitations.

Findings

01

Bias parameters vary across datasets and tasks

02

Existing bias metrics are limited and task-specific

03

Proposed parameters can help develop a benchmark quality metric

Abstract

Several benchmarks have been built with heavy investment in resources to track our progress in NLP. Thousands of papers published in response to those benchmarks have competed to top leaderboards, with models often surpassing human performance. However, recent studies have shown that models triumph over several popular benchmarks just by overfitting on spurious biases, without truly learning the desired task. Despite this finding, benchmarking, while trying to tackle bias, still relies on workarounds, which do not fully utilize the resources invested in benchmark creation, due to the discarding of low quality data, and cover limited sets of bias. A potential solution to these issues -- a metric quantifying quality -- remains underexplored. Inspired by successful quality indices in several domains such as power, food, and water, we take the first step towards a metric by identifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research