On the Interpretability and Significance of Bias Metrics in Texts: a   PMI-based Approach

Francisco Valentini; Germ\'an Rosati; Dami\'an Blasi; Diego Fernandez; Slezak; and Edgar Altszyler

arXiv:2104.06474·cs.CL·July 19, 2023·1 cites

On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach

Francisco Valentini, Germ\'an Rosati, Dami\'an Blasi, Diego Fernandez, Slezak, and Edgar Altszyler

PDF

Open Access 1 Repo

TL;DR

This paper introduces a PMI-based bias metric for texts that offers greater interpretability and statistical significance estimation, aligning with embedding-based methods in measuring real-world biases.

Contribution

It proposes a transparent PMI-based bias metric expressed through conditional probabilities, with an approximation to odds ratios for significance testing.

Findings

01

The PMI-based metric aligns with embedding-based bias measurements.

02

It allows estimation of confidence intervals for bias significance.

03

The approach effectively captures real-world gender biases in large corpora.

Abstract

In recent years, word embeddings have been widely used to measure biases in texts. Even if they have proven to be effective in detecting a wide variety of biases, metrics based on word embeddings lack transparency and interpretability. We analyze an alternative PMI-based metric to quantify biases in texts. It can be expressed as a function of conditional probabilities, which provides a simple interpretation in terms of word co-occurrences. We also prove that it can be approximated by an odds ratio, which allows estimating confidence intervals and statistical significance of textual biases. This approach produces similar results to metrics based on word embeddings when capturing gender gaps of the real world embedded in large corpora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ftvalentini/biaspmi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling