Fast transcription of speech in low-resource languages

Mark Hasegawa-Johnson; Camille Goudeseune; Gina-Anne Levow

arXiv:1909.07285·cs.CL·September 17, 2019

Fast transcription of speech in low-resource languages

Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow

PDF

Open Access 1 Repo

TL;DR

This paper introduces a rapid and resource-efficient speech transcription method for low-resource languages, leveraging minimal text data, a pretrained acoustic model, and a G2P table to produce accurate transcriptions within hours.

Contribution

It presents a novel low-resource speech transcription pipeline that requires only a few hours and minimal data, enabling quick transcription of diverse languages.

Findings

01

Successfully transcribed 40 hours of speech in multiple languages

02

Operates with only tens of megabytes of noisy text data

03

Uses a combination of pretrained models and G2P tables

Abstract

We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a language model maps these to a most-likely grapheme sequence, i.e., a transcription. This software has worked successfully with corpora in Arabic, Assam, Kinyarwanda, Russian, Sinhalese, Swahili, Tagalog, and Tamil.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uiuc-sst/asr24
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing