Loss Prediction: End-to-End Active Learning Approach For Speech   Recognition

Jian Luo; Jianzong Wang; Ning Cheng; Jing Xiao

arXiv:2107.04289·eess.AS·July 12, 2021·IJCNN

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces an end-to-end active learning method for speech recognition that predicts sample loss to select the most informative data, reducing annotation costs and improving model performance.

Contribution

It proposes a novel joint model for speech recognition and loss prediction, leveraging CTC and attention losses for effective active learning.

Findings

01

Outperforms random, least confidence, and estimated loss methods.

02

Validated on English and Chinese speech tasks.

03

Achieves competitive results with reduced annotation effort.

Abstract

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the solution by selecting the most valuable samples for annotation. In this paper, we proposed to use a predicted loss that estimates the uncertainty of the sample. The CTC (Connectionist Temporal Classification) and attention loss are informative for speech recognition since they are computed based on all decoding paths and alignments. We defined an end-to-end active learning pipeline, training an ASR/LP (Automatic Speech Recognition/Loss Prediction) joint model. The proposed approach was validated on an English and a Chinese speech recognition task. The experiments show that our approach achieves competitive results, outperforming random selection, least confidence, and estimated loss method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Speech Recognition and Synthesis · Natural Language Processing Techniques