Fast transcription of speech in low-resource languages
Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow

TL;DR
This paper introduces a rapid and resource-efficient speech transcription method for low-resource languages, leveraging minimal text data, a pretrained acoustic model, and a G2P table to produce accurate transcriptions within hours.
Contribution
It presents a novel low-resource speech transcription pipeline that requires only a few hours and minimal data, enabling quick transcription of diverse languages.
Findings
Successfully transcribed 40 hours of speech in multiple languages
Operates with only tens of megabytes of noisy text data
Uses a combination of pretrained models and G2P tables
Abstract
We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a language model maps these to a most-likely grapheme sequence, i.e., a transcription. This software has worked successfully with corpora in Arabic, Assam, Kinyarwanda, Russian, Sinhalese, Swahili, Tagalog, and Tamil.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
