Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
Yi-Cheng Lin, Wei-Chih Chen, Hung-yi Lee

TL;DR
This paper introduces Spoken Stereoset, a dataset for evaluating social biases in Speech Large Language Models, revealing that most models exhibit minimal bias but some still show stereotypical tendencies.
Contribution
The study presents Spoken Stereoset, the first dataset specifically designed to measure social biases in speech-based large language models.
Findings
Most models show minimal bias
Some models exhibit stereotypical tendencies
Bias levels vary across models
Abstract
Warning: This paper may contain texts with uncomfortable content. Large Language Models (LLMs) have achieved remarkable performance in various tasks, including those involving multimodal data like speech. However, these models often exhibit biases due to the nature of their training data. Recently, more Speech Large Language Models (SLLMs) have emerged, underscoring the urgent need to address these biases. This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in SLLMs. By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. Our experiments reveal significant insights into their performance and bias levels. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems
