Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David, Mortensen, Michael R. Marlo, Graham Neubig

TL;DR
This study demonstrates that fine-tuning multilingual phonetic models like Allosaurus with minimal data significantly improves phoneme recognition for low-resource languages, including Luhya varieties and an endangered language.
Contribution
It introduces new datasets for Luhya language varieties and shows effective phoneme recognition improvements through fine-tuning with limited data.
Findings
Fine-tuning with 100 utterances reduces phoneme error rates.
Allosaurus outperforms baseline models in low-resource scenarios.
Datasets for Bukusu, Saamia, and East Tusom are first of their kind.
Abstract
Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages. In this work, we focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation, which incorporates phonological knowledge through a language-dependent allophone layer that associates a universal narrow phone-set with the phonemes that appear in each language. To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda. To our knowledge, these datasets are the first of their kind. We carry out similar experiments on the dataset of an endangered Tangkhulic language, East Tusom, a Tibeto-Burman language variety spoken mostly in India. We explore both zero-shot and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
