Dialect and Gender Bias in YouTube's Spanish Captioning System
Iris Dania Jimenez, Christoph Kern

TL;DR
This paper investigates potential dialect and gender biases in YouTube's Spanish captioning system by analyzing its performance across different dialects and genders, revealing systematic disparities that highlight the need for more inclusive algorithmic calibration.
Contribution
It provides the first comprehensive analysis of dialect and gender biases in YouTube's Spanish automatic captions, emphasizing the importance of addressing linguistic diversity in speech recognition systems.
Findings
Identified systematic disparities in caption quality across Spanish dialects.
Found gender-based performance differences in caption accuracy.
Highlighted the need for bias mitigation in speech recognition technologies.
Abstract
Spanish is the official language of twenty-one countries and is spoken by over 441 million people. Naturally, there are many variations in how Spanish is spoken across these countries. Media platforms such as YouTube rely on automatic speech recognition systems to make their content accessible to different groups of users. However, YouTube offers only one option for automatically generating captions in Spanish. This raises the question: could this captioning system be biased against certain Spanish dialects? This study examines the potential biases in YouTube's automatic captioning system by analyzing its performance across various Spanish dialects. By comparing the quality of captions for female and male speakers from different regions, we identify systematic disparities which can be attributed to specific dialects. Our study provides further evidence that algorithmic technologies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Translation Studies and Practices · Multilingual Education and Policy
