NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages
Marie Maltais, Yejin Jeon, Min Ma, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Maryam Ibrahim Mukhtar, Daud Abolade, Joel Okepefi, Johnson Sewedo, David Ifeoluwa Adelani

TL;DR
NaijaS2ST introduces a comprehensive Nigerian language speech translation dataset and benchmarks various approaches, revealing audio LLMs excel in speech-to-text tasks, while speech-to-speech translation remains challenging.
Contribution
The paper presents a new multilingual Nigerian speech translation dataset and provides a systematic benchmark of different translation methods in low-resource settings.
Findings
Audio LLMs outperform cascaded and end-to-end models in speech-to-text translation.
Cascaded and audio LLM approaches perform similarly in speech-to-speech translation.
NaijaS2ST offers a valuable resource for advancing low-resource multilingual speech translation.
Abstract
Speech translation for low-resource languages remains fundamentally limited by the scarcity of high-quality, diverse parallel speech data, a challenge that is especially pronounced in African linguistic contexts. To address this, we introduce NaijaS2ST, a parallel speech translation dataset spanning Igbo, Hausa, Yor\`ub\'a, and Nigerian Pidgin paired with English. The dataset comprises approximately 50 hours of speech per language and captures substantial variation in speakers and accents, reflecting realistic multilingual and multi-accent conditions. With NaijaS2ST, we conduct a comprehensive benchmark of cascaded, end-to-end (E2E), and AudioLLM-based approaches across bidirectional translation settings. Our results show that audio LLMs with few-shot examples are more effective for speech-to-text translation than cascaded and end-to-end methods trained on fine-tuned data. However, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
