End-to-End Automatic Speech Recognition model for the Sudanese Dialect
Ayman Mansour, Wafaa F. Mukhtar

TL;DR
This paper explores the development of an end-to-end speech recognition model for the Sudanese dialect, addressing resource scarcity and demonstrating a baseline with a 73.67% label error rate.
Contribution
It introduces a novel dataset for the Sudanese dialect and proposes a CNN-based end-to-end speech recognition model tailored for this underrepresented language variant.
Findings
Achieved an average Label Error Rate of 73.67%.
Constructed a modest Sudanese dialect dataset.
Provided insights into recognition challenges for the dialect.
Abstract
Designing a natural voice interface rely mostly on Speech recognition for interaction between human and their modern digital life equipment. In addition, speech recognition narrows the gap between monolingual individuals to better exchange communication. However, the field lacks wide support for several universal languages and their dialects, while most of the daily conversations are carried out using them. This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect, which is one of the Arabic Language dialects, and its complexity is a product of historical and social conditions unique to its speakers. This condition is reflected in both the form and content of the dialect, so this paper gives an overview of the Sudanese dialect and the tasks of collecting represented resources and pre-processing performed to construct a modest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
MethodsConvolution
