Pre-trained Speech Processing Models Contain Human-Like Biases that   Propagate to Speech Emotion Recognition

Isaac Slaughter; Craig Greenberg; Reva Schwartz; Aylin Caliskan

arXiv:2310.18877·cs.CL·October 31, 2023·2 cites

Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition

Isaac Slaughter, Craig Greenberg, Reva Schwartz, Aylin Caliskan

PDF

Open Access 1 Repo

TL;DR

This paper introduces the SpEAT, a method to detect human-like biases in pre-trained speech models, revealing that these biases can influence speech emotion recognition outcomes.

Contribution

The study develops the SpEAT to quantify biases in speech models and demonstrates their presence and impact on downstream emotion recognition tasks.

Findings

01

Most models show positive valence biases towards certain social groups.

02

Biases in pre-trained models often propagate to emotion recognition results.

03

Pre-trained speech models frequently learn and reflect human-like biases.

Abstract

Previous work has established that a person's demographics and speech style affect how well speech processing models perform for them. But where does this bias come from? In this work, we present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models. The SpEAT is inspired by word embedding association tests in natural language processing, which quantify intrinsic bias in a model's representations of different concepts, such as race or valence (something's pleasantness or unpleasantness) and capture the extent to which a model trained on large-scale socio-cultural data has learned human-like biases. Using the SpEAT, we test for six types of bias in 16 English speech models (including 4 models also trained on multilingual data), which come from the wav2vec 2.0, HuBERT, WavLM, and Whisper model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isaaconline/speat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis