Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music
Shivam Chauhan, Ajay Pundhir

TL;DR
This study evaluates cross-cultural biases in mel-scale audio representations across speech, music, and scene classification, showing alternative methods can significantly reduce disparities and promote inclusivity.
Contribution
The paper provides a comprehensive analysis of cultural biases in audio front-ends and introduces alternative representations that mitigate these biases.
Findings
Mel-scale features show significant performance gaps between tonal and non-tonal languages.
Alternative representations like LEAF and CQT substantially reduce cross-cultural disparities.
Adaptive frequency decomposition improves fairness with minimal computational cost.
Abstract
Modern audio systems universally employ mel-scale representations derived from 1940s Western psychoacoustic studies, potentially encoding cultural biases that create systematic performance disparities. We present a comprehensive evaluation of cross-cultural bias in audio front-ends, comparing mel-scale features with learnable alternatives (LEAF, SincNet) and psychoacoustic variants (ERB, Bark, CQT) across speech recognition (11 languages), music analysis (6 collections), and European acoustic scene classification (10 European cities). Our controlled experiments isolate front-end contributions while holding architecture and training protocols minimal and constant. Results demonstrate that mel-scale features yield 31.2% WER for tonal languages compared to 18.7% for non-tonal languages (12.5% gap), and show 15.7% F1 degradation between Western and non-Western music. Alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
