Hard Sample Mining for the Improved Retraining of Automatic Speech   Recognition

Jiabin Xue; Jiqing Han; Tieran Zheng; Jiaxing Guo; Boyong Wu

arXiv:1904.08031·cs.SD·April 18, 2019·6 cites

Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition

Jiabin Xue, Jiqing Han, Tieran Zheng, Jiaxing Guo, Boyong Wu

PDF

Open Access

TL;DR

This paper introduces a novel hard sample mining approach using deep multiple instance learning to enhance retraining of ASR systems, effectively identifying challenging samples from unlabeled data to improve recognition accuracy.

Contribution

It proposes an enhanced deep multiple instance learning method for hard sample mining, reducing manual labeling efforts and improving ASR retraining performance.

Findings

01

Achieved improved ASR performance with the proposed method.

02

Successfully identified hard samples from unlabeled data.

03

Enhanced deep multiple instance learning effectively finds challenging samples.

Abstract

It is an effective way that improves the performance of the existing Automatic Speech Recognition (ASR) systems by retraining with more and more new training data in the target domain. Recently, Deep Neural Network (DNN) has become a successful model in the ASR field. In the training process of the DNN based methods, a back propagation of error between the transcription and the corresponding annotated text is used to update and optimize the parameters. Thus, the parameters are more influenced by the training samples with a big propagation error than the samples with a small one. In this paper, we define the samples with significant error as the hard samples and try to improve the performance of the ASR system by adding many of them. Unfortunately, the hard samples are sparse in the training data of the target domain, and manually label them is expensive. Therefore, we propose a hard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Video Analysis and Summarization · Speech Recognition and Synthesis