Simpson's Bias in NLP Training

Fei Yuan; Longtu Zhang; Huang Bojun; Yaobo Liang

arXiv:2103.11795·cs.CL·March 23, 2021

Simpson's Bias in NLP Training

Fei Yuan, Longtu Zhang, Huang Bojun, Yaobo Liang

PDF

Open Access 1 Video

TL;DR

This paper investigates Simpson's bias in NLP training, revealing that common sample-level loss functions can be inconsistent with true population metrics, leading to sub-optimal model performance.

Contribution

It provides a systematic theoretical and experimental analysis of Simpson's bias in NLP, highlighting its impact on model training and evaluation.

Findings

01

Popular loss functions may not align with true evaluation metrics.

02

Models optimized with certain losses can be substantially sub-optimal.

03

The paper connects Simpson's bias with classical statistical paradoxes.

Abstract

In most machine learning tasks, we evaluate a model $M$ on a given data population $S$ by measuring a population-level metric $F (S; M)$ . Examples of such evaluation metric $F$ include precision/recall for (binary) recognition, the F1 score for multi-class classification, and the BLEU metric for language generation. On the other hand, the model $M$ is trained by optimizing a sample-level loss $G (S_{t}; M)$ at each learning step $t$ , where $S_{t}$ is a subset of $S$ (a.k.a. the mini-batch). Popular choices of $G$ include cross-entropy loss, the Dice loss, and sentence-level BLEU scores. A fundamental assumption behind this paradigm is that the mean value of the sample-level loss $G$ , if averaged over all possible samples, should effectively represent the population-level metric $F$ of the task, such as, that $E [G (S_{t}; M)] \approx F (S; M)$ . In this paper, we systematically investigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Simpson's Bias in NLP Training· underline

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI)