Unsupervised data selection for Speech Recognition with contrastive loss ratios
Chanho Park, Rehan Ahmad, Thomas Hain

TL;DR
This paper introduces an unsupervised data selection method for speech recognition using contrastive loss ratios and submodular functions, leading to improved WER performance over traditional likelihood-based methods.
Contribution
It presents a novel unsupervised data selection technique based on contrastive loss ratios and submodular optimization, enhancing speech recognition training efficiency and accuracy.
Findings
Outperforms likelihood-based selection in WER reduction
Reduces WER by 20.23% on Tedtalks dataset
Decreases WER by 6.26% on WSJCAM0 with less data
Abstract
This paper proposes an unsupervised data selection method by using a submodular function based on contrastive loss ratios of target and training data sets. A model using a contrastive loss function is trained on both sets. Then the ratio of frame-level losses for each model is used by a submodular function. By using the submodular function, a training set for automatic speech recognition matching the target data set is selected. Experiments show that models trained on the data sets selected by the proposed method outperform the selection method based on log-likelihoods produced by GMM-HMM models, in terms of word error rate (WER). When selecting a fixed amount, e.g. 10 hours of data, the difference between the results of two methods on Tedtalks was 20.23% WER relative. The method can also be used to select data with the aim of minimising negative transfer, while maintaining or improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
