Loss Prediction: End-to-End Active Learning Approach For Speech Recognition
Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

TL;DR
This paper introduces an end-to-end active learning method for speech recognition that predicts sample loss to select the most informative data, reducing annotation costs and improving model performance.
Contribution
It proposes a novel joint model for speech recognition and loss prediction, leveraging CTC and attention losses for effective active learning.
Findings
Outperforms random, least confidence, and estimated loss methods.
Validated on English and Chinese speech tasks.
Achieves competitive results with reduced annotation effort.
Abstract
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the solution by selecting the most valuable samples for annotation. In this paper, we proposed to use a predicted loss that estimates the uncertainty of the sample. The CTC (Connectionist Temporal Classification) and attention loss are informative for speech recognition since they are computed based on all decoding paths and alignments. We defined an end-to-end active learning pipeline, training an ASR/LP (Automatic Speech Recognition/Loss Prediction) joint model. The proposed approach was validated on an English and a Chinese speech recognition task. The experiments show that our approach achieves competitive results, outperforming random selection, least confidence, and estimated loss method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Speech Recognition and Synthesis · Natural Language Processing Techniques
