A Deep Learning Automatic Speech Recognition Model for Shona Language
Leslie Wellington Sirora, Mainford Mutandavari

TL;DR
This paper develops a deep learning-based automatic speech recognition system for the Shona language, addressing challenges of limited data and tonal complexity, and demonstrating improved accuracy over traditional models.
Contribution
It introduces a hybrid deep learning architecture with data augmentation and transfer learning tailored for Shona speech recognition, a low-resource language.
Findings
Achieved 74% overall accuracy in Shona speech recognition.
Reduced Word Error Rate to 29% using deep learning techniques.
Demonstrated the effectiveness of transfer learning and attention mechanisms for tonal languages.
Abstract
This study presented the development of a deep learning-based Automatic Speech Recognition system for Shona, a low-resource language characterized by unique tonal and grammatical complexities. The research aimed to address the challenges posed by limited training data, lack of labelled data, and the intricate tonal nuances present in Shona speech, with the objective of achieving significant improvements in recognition accuracy compared to traditional statistical models. The research first explored the feasibility of using deep learning to develop an accurate ASR system for Shona. Second, it investigated the specific challenges involved in designing and implementing deep learning architectures for Shona speech recognition and proposed strategies to mitigate these challenges. Lastly, it compared the performance of the deep learning-based model with existing statistical models in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
