Hard to Be Heard: Phoneme-Level ASR Analysis of Phonologically Complex, Low-Resource Endangered Languages

V.S.D.S.Mahesh Akavarapu; Michael Daniel; Gerhard J\"ager

arXiv:2604.18204·cs.CL·April 21, 2026

Hard to Be Heard: Phoneme-Level ASR Analysis of Phonologically Complex, Low-Resource Endangered Languages

V.S.D.S.Mahesh Akavarapu, Michael Daniel, Gerhard J\"ager

PDF

TL;DR

This paper analyzes phoneme-level ASR performance on two low-resource, complex East Caucasian languages, revealing data scarcity as a key factor influencing errors and demonstrating the importance of phoneme-level evaluation.

Contribution

It introduces a phoneme-level analysis framework for low-resource languages, compares state-of-the-art models, and highlights data scarcity's impact on phoneme recognition accuracy.

Findings

01

Phoneme recognition accuracy correlates with training frequency.

02

Wav2vec2 with language-specific phoneme vocabulary performs well.

03

Data scarcity explains many errors attributed to phonological complexity.

Abstract

We present a phoneme-level analysis of automatic speech recognition (ASR) for two low-resourced and phonologically complex East Caucasian languages, Archi and Rutul, based on curated and standardized speech-transcript resources totaling approximately 50 minutes and 1 hour 20 minutes of audio, respectively. Existing recordings and transcriptions are consolidated and processed into a form suitable for ASR training and evaluation. We evaluate several state-of-the-art audio and audio-language models, including wav2vec2, Whisper, and Qwen2-Audio. For wav2vec2, we introduce a language-specific phoneme vocabulary with heuristic output-layer initialization, which yields consistent improvements and achieves performance comparable to or exceeding Whisper in these extremely low-resource settings. Beyond standard word and character error rates, we conduct a detailed phoneme-level error analysis. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.