Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Tiantian Feng; Kevin Huang; Anfeng Xu; Xuan Shi; Thanathai Lertpetchpun; Jihwan Lee; Yoonjeong Lee; Dani Byrd; Shrikanth Narayanan

arXiv:2508.01691·cs.SD·August 5, 2025

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Tiantian Feng, Kevin Huang, Anfeng Xu, Xuan Shi, Thanathai Lertpetchpun, Jihwan Lee, Yoonjeong Lee, Dani Byrd, Shrikanth Narayanan

PDF

Open Access 10 Models

TL;DR

Voxlect is a comprehensive benchmark for evaluating speech foundation models' ability to classify dialects and regional languages globally, supporting diverse applications like ASR analysis and speech generation evaluation.

Contribution

The paper introduces Voxlect, a new benchmark with extensive dialectal speech data, enabling systematic evaluation of speech models across multiple languages and dialects.

Findings

01

Models show varying accuracy across dialects and languages.

02

Robustness of dialect classification under noisy conditions is assessed.

03

Voxlect enhances dialect-aware speech recognition and generation systems.

Abstract

We present Voxlect, a novel benchmark for modeling dialects and regional languages worldwide using speech foundation models. Specifically, we report comprehensive benchmark evaluations on dialects and regional language varieties in English, Arabic, Mandarin and Cantonese, Tibetan, Indic languages, Thai, Spanish, French, German, Brazilian Portuguese, and Italian. Our study used over 2 million training utterances from 30 publicly available speech corpora that are provided with dialectal information. We evaluate the performance of several widely used speech foundation models in classifying speech dialects. We assess the robustness of the dialectal models under noisy conditions and present an error analysis that highlights modeling results aligned with geographic continuity. In addition to benchmarking dialect classification, we demonstrate several downstream applications enabled by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Speech Recognition and Synthesis · Linguistic Variation and Morphology