Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
Tiantian Feng, Kevin Huang, Anfeng Xu, Xuan Shi, Thanathai Lertpetchpun, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan

TL;DR
Voxlect is a comprehensive benchmark for evaluating speech foundation models' ability to classify dialects and regional languages globally, supporting diverse applications like ASR analysis and speech generation evaluation.
Contribution
The paper introduces Voxlect, a new benchmark with extensive dialectal speech data, enabling systematic evaluation of speech models across multiple languages and dialects.
Findings
Models show varying accuracy across dialects and languages.
Robustness of dialect classification under noisy conditions is assessed.
Voxlect enhances dialect-aware speech recognition and generation systems.
Abstract
We present Voxlect, a novel benchmark for modeling dialects and regional languages worldwide using speech foundation models. Specifically, we report comprehensive benchmark evaluations on dialects and regional language varieties in English, Arabic, Mandarin and Cantonese, Tibetan, Indic languages, Thai, Spanish, French, German, Brazilian Portuguese, and Italian. Our study used over 2 million training utterances from 30 publicly available speech corpora that are provided with dialectal information. We evaluate the performance of several widely used speech foundation models in classifying speech dialects. We assess the robustness of the dialectal models under noisy conditions and present an error analysis that highlights modeling results aligned with geographic continuity. In addition to benchmarking dialect classification, we demonstrate several downstream applications enabled by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tiantiaf/voxlect-spanish-dialect-whisper-large-v3model· 12 dl· ♡ 412 dl♡ 4
- 🤗tiantiaf/voxlect-indic-lid-whisper-large-v3model· 7 dl· ♡ 17 dl♡ 1
- 🤗tiantiaf/voxlect-mandarin-cantonese-dialect-whisper-large-v3model· 10 dl· ♡ 110 dl♡ 1
- 🤗tiantiaf/voxlect-french-dialect-whisper-large-v3model
- 🤗tiantiaf/voxlect-thai-dialect-whisper-large-v3model· 5 dl5 dl
- 🤗tiantiaf/voxlect-german-dialect-whisper-large-v3model· 4 dl· ♡ 14 dl♡ 1
- 🤗tiantiaf/voxlect-spanish-dialect-mms-lid-256model· 42 dl· ♡ 242 dl♡ 2
- 🤗tiantiaf/voxlect-indic-lid-mms-lid-256model· 78 dl· ♡ 178 dl♡ 1
- 🤗tiantiaf/voxlect-french-dialect-mms-lid-256model· 125 dl· ♡ 1125 dl♡ 1
- 🤗tiantiaf/voxlect-thai-dialect-mms-lid-256model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Speech Recognition and Synthesis · Linguistic Variation and Morphology
