AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition
Busayo Awobade, Gabrial Zencha Ashungafac, Tobi Olatunji

TL;DR
AfriVox-v2 is a comprehensive benchmark for African speech recognition, evaluating models in realistic, noisy, and domain-specific conditions to identify generalization gaps and guide localized voice AI development.
Contribution
It introduces 'in the wild' audio, strict domain verticalization, and benchmarks new speech models, addressing the lack of realistic African language evaluation benchmarks.
Findings
Modern speech models show significant generalization gaps in African contexts.
The benchmark reveals performance disparities across sectors like health and finance.
New models like Sahara-v2 and Gemini 3 Flash are evaluated under realistic conditions.
Abstract
Recent large language models (LLMs) show strong speech recognition and translation capabilities for high-resource languages. However, African languages remain dramatically underrepresented in benchmarks, limiting their practical use in low-resource settings. While early benchmarks tested African languages and accents, they lacked exhaustive real-world noise and granular domain evaluations. We present AfriVox-v2, a comprehensive benchmark designed to test speech models under realistic African deployment conditions. AfriVox-v2 introduces "in the wild" unscripted audio for all supported languages. We also introduce strict domain verticalization, evaluating model accuracy across ten sectors including government, finance, health, and agriculture and conducting targeted tests on numbers and named entities. Finally, we benchmark a new generation of speech models, including Sahara-v2, Gemini 3…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
