Social Biases in Automatic Evaluation Metrics for NLG

Mingqi Gao; Xiaojun Wan

arXiv:2210.08859·cs.CL·October 18, 2022·1 cites

Social Biases in Automatic Evaluation Metrics for NLG

Mingqi Gao, Xiaojun Wan

PDF

Open Access 7 Models

TL;DR

This paper investigates social biases, especially gender bias, in automatic evaluation metrics for NLP text generation, revealing that these biases influence metric assessments and vary with gender-swapped references.

Contribution

It introduces a novel method using WEAT and SEAT to quantify biases in evaluation metrics and constructs gender-swapped datasets to analyze bias impact on evaluation.

Findings

01

Biases are prevalent in model-based evaluation metrics.

02

Gender swapping affects the correlation between metrics and human judgments.

03

Evaluation metrics tend to favor male hypotheses with gender-neutral references.

Abstract

Many studies have revealed that word embeddings, language models, and models for specific downstream tasks in NLP are prone to social biases, especially gender bias. Recently these techniques have been gradually applied to automatic evaluation metrics for text generation. In the paper, we propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics and discover that social biases are also widely present in some model-based automatic evaluation metrics. Moreover, we construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks. Results show that given gender-neutral references in the evaluation, model-based evaluation metrics may show a preference for the male hypothesis, and the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsTest