Curriculum optimization for low-resource speech recognition

Anastasia Kuznetsova; Anurag Kumar; Jennifer Drexler Fox; Francis; Tyers

arXiv:2202.08883·eess.AS·February 21, 2022

Curriculum optimization for low-resource speech recognition

Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis, Tyers

PDF

Open Access

TL;DR

This paper introduces an automated curriculum learning method that optimizes training example sequences for low-resource speech recognition, significantly improving Word Error Rate performance by up to 33%.

Contribution

It presents a novel difficulty measure called compression ratio and an automated curriculum approach tailored for low-resource speech recognition tasks.

Findings

01

Up to 33% relative WER reduction over baseline

02

Effective use of compression ratio as difficulty measure

03

Improved training efficiency for low-resource speech data

Abstract

Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text. However, conventional data feeding pipelines may be sub-optimal for low-resource speech recognition, which still remains a challenging task. We propose an automated curriculum learning approach to optimize the sequence of training examples based on both the progress of the model while training and prior knowledge about the difficulty of the training examples. We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions. The proposed method improves speech recognition Word Error Rate performance by up to 33% relative over the baseline system

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing