Automatic Speech Recognition for Humanitarian Applications in Somali

Raghav Menon; Astik Biswas; Armin Saeb; John Quinn; Thomas Niesler

arXiv:1807.08669·cs.CL·July 24, 2018

Automatic Speech Recognition for Humanitarian Applications in Somali

Raghav Menon, Astik Biswas, Armin Saeb, John Quinn, Thomas Niesler

PDF

TL;DR

This paper develops an initial Somali speech recognition system with limited data, utilizing neural architectures and data augmentation techniques, achieving a 53.75% word error rate for humanitarian applications.

Contribution

It introduces the first Somali speech recognition system using neural models and data augmentation, tailored for under-resourced languages in humanitarian contexts.

Findings

01

Data augmentation improves performance

02

Neural architectures outperform traditional models

03

Achieved 53.75% WER with CNN, TDNN, and BLSTM

Abstract

We present our first efforts in building an automatic speech recognition system for Somali, an under-resourced language, using 1.57 hrs of annotated speech for acoustic model training. The system is part of an ongoing effort by the United Nations (UN) to implement keyword spotting systems supporting humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We evaluate several types of acoustic model, including recent neural architectures. Language model data augmentation using a combination of recurrent neural networks (RNN) and long short-term memory neural networks (LSTMs) as well as the perturbation of acoustic data are also considered. We find that both types of data augmentation are beneficial to performance, with our best system using a combination of convolutional neural networks (CNNs), time-delay neural networks (TDNNs) and bi-directional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.