Semantically Corrected Amharic Automatic Speech Recognition
Samuael Adnew, Paul Pu Liang

TL;DR
This paper improves Amharic speech recognition by correcting dataset transcriptions and introducing a transformer-based post-processing step, significantly enhancing semantic accuracy and providing reliable benchmarks for future research.
Contribution
It releases corrected Amharic ASR datasets and proposes a novel transformer-based post-processing method to improve semantic correctness in recognition outputs.
Findings
Achieved CER of 5.5% and WER of 23.3% on the corrected dataset.
Revealed that existing benchmarks inflate performance metrics due to unaccounted spacings.
Enhanced the semantic accuracy of Amharic ASR systems.
Abstract
Automatic Speech Recognition (ASR) can play a crucial role in enhancing the accessibility of spoken languages worldwide. In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. Amharic is written in the Ge'ez script, a sequence of graphemes with spacings denoting word boundaries. This makes computational processing of Amharic challenging since the location of spacings can significantly impact the meaning of formed sentences. We find that existing benchmarks for Amharic ASR do not account for these spacings and only measure individual grapheme error rates, leading to significantly inflated measurements of in-the-wild performance. In this paper, we first release corrected transcriptions of existing Amharic ASR test datasets, enabling the community to accurately evaluate progress. Furthermore, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Linguistics, Cultural Analysis
MethodsSparse Evolutionary Training
