Spoken Term Detection Methods for Sparse Transcription in Very   Low-resource Settings

\'Eric Le Ferrand; Steven Bird; Laurent Besacier

arXiv:2106.06160·cs.CL·June 14, 2021

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

\'Eric Le Ferrand, Steven Bird, Laurent Besacier

PDF

Open Access

TL;DR

This paper compares two spoken term detection methods in very low-resource settings, demonstrating that a fine-tuned universal phone recognizer outperforms traditional approaches and that graph-based phoneme ambiguity representation improves recall.

Contribution

It introduces a fine-tuning approach for universal phone recognizers and a graph-based phoneme ambiguity representation for low-resource spoken term detection.

Findings

01

Universal phone recognizer outperforms DTW approach

02

Graph structure boosts recall while maintaining precision

03

Fine-tuning with few minutes of speech is effective

Abstract

We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system. This work is grounded in very low-resource language documentation scenario where only few minutes of recording have been transcribed for a given language so far.Experiments on two oral languages show that a pretrained universal phone recognizer, fine-tuned with only a few minutes of target language speech, can be used for spoken term detection with a better overall performance than a dynamic time warping approach. In addition, we show that representing phoneme recognition ambiguity in a graph structure can further boost the recall while maintaining high precision in the low resource spoken term detection task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing