Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music

Shivam Chauhan; Ajay Pundhir

arXiv:2604.10503·cs.SD·April 14, 2026

Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music

Shivam Chauhan, Ajay Pundhir

PDF

TL;DR

This study evaluates cross-cultural biases in mel-scale audio representations across speech, music, and scene classification, showing alternative methods can significantly reduce disparities and promote inclusivity.

Contribution

The paper provides a comprehensive analysis of cultural biases in audio front-ends and introduces alternative representations that mitigate these biases.

Findings

01

Mel-scale features show significant performance gaps between tonal and non-tonal languages.

02

Alternative representations like LEAF and CQT substantially reduce cross-cultural disparities.

03

Adaptive frequency decomposition improves fairness with minimal computational cost.

Abstract

Modern audio systems universally employ mel-scale representations derived from 1940s Western psychoacoustic studies, potentially encoding cultural biases that create systematic performance disparities. We present a comprehensive evaluation of cross-cultural bias in audio front-ends, comparing mel-scale features with learnable alternatives (LEAF, SincNet) and psychoacoustic variants (ERB, Bark, CQT) across speech recognition (11 languages), music analysis (6 collections), and European acoustic scene classification (10 European cities). Our controlled experiments isolate front-end contributions while holding architecture and training protocols minimal and constant. Results demonstrate that mel-scale features yield 31.2% WER for tonal languages compared to 18.7% for non-tonal languages (12.5% gap), and show 15.7% F1 degradation between Western and non-Western music. Alternative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.