Automatic Speech Recognition of African American English: Lexical and Contextual Effects
Hamid Mojarad, Kevin Tang

TL;DR
This paper investigates how African American English features, specifically CCR and ING-reduction, affect ASR accuracy, revealing that lexical effects are more prominent in systems without external language models.
Contribution
It provides new insights into the influence of specific AAE features on ASR performance and compares the effects with and without external language models.
Findings
CCR and ING-reduction significantly affect Word Error Rate
ASR systems without LMs are more influenced by lexical neighborhood effects
Lexical effects are small but statistically significant
Abstract
Automatic Speech Recognition (ASR) models often struggle with the phonetic, phonological, and morphosyntactic features found in African American English (AAE). This study focuses on two key AAE variables: Consonant Cluster Reduction (CCR) and ING-reduction. It examines whether the presence of CCR and ING-reduction increases ASR misrecognition. Subsequently, it investigates whether end-to-end ASR systems without an external Language Model (LM) are more influenced by lexical neighborhood effect and less by contextual predictability compared to systems with an LM. The Corpus of Regional African American Language (CORAAL) was transcribed using wav2vec 2.0 with and without an LM. CCR and ING-reduction were detected using the Montreal Forced Aligner (MFA) with pronunciation expansion. The analysis reveals a small but significant effect of CCR and ING on Word Error Rate (WER) and indicates a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
