# Measuring Societal Biases from Text Corpora with Smoothed First-Order   Co-occurrence

**Authors:** Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury

arXiv: 1812.10424 · 2021-04-28

## TL;DR

This paper introduces a novel bias measurement method based on smoothed first-order co-occurrence relations, which better correlates with real-world gender bias statistics in occupational words than traditional embedding similarity methods.

## Contribution

The study proposes an alternative bias measurement approach using first-order co-occurrence, improving correlation with actual societal biases over existing similarity-based methods.

## Key findings

- First-order co-occurrence approach shows higher correlation with real-world gender bias statistics.
- The method reveals more severe female bias in specific occupations.
- Compared to traditional methods, the new approach reduces irrelevant concept influence.

## Abstract

Text corpora are widely used resources for measuring societal biases and stereotypes. The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders). In this study, we show that, depending on what one aims to quantify as bias, this commonly-used approach can introduce non-relevant concepts into bias measurement. We propose an alternative approach to bias measurement utilizing the smoothed first-order co-occurrence relations between the word and the representative concept words, which we derive by reconstructing the co-occurrence estimates inherent in word embedding models. We compare these approaches by conducting several experiments on the scenario of measuring gender bias of occupational words, according to an English Wikipedia corpus. Our experiments show higher correlations of the measured gender bias with the actual gender bias statistics of the U.S. job market - on two collections and with a variety of word embedding models - using the first-order approach in comparison with the vector similarity-based approaches. The first-order approach also suggests a more severe bias towards female in a few specific occupations than the other approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.10424/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1812.10424/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1812.10424/full.md

---
Source: https://tomesphere.com/paper/1812.10424