Speech Corpora Divergence Based Unsupervised Data Selection for ASR

Changfeng Gao; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan

arXiv:2302.13222·cs.CL·February 28, 2023·1 cites

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

PDF

Open Access

TL;DR

This paper introduces an unsupervised speech corpora divergence method for selecting training data that closely matches target speech characteristics, improving ASR performance without requiring labeled data.

Contribution

It proposes a novel unsupervised data selection approach based on speech corpora divergence using self-supervised models, enhancing diversity and acoustic detail focus.

Findings

01

Achieves 14.8% relative improvement over random selection

02

Performs comparably or better than supervised selection methods

03

Effective across different accents in Common Voice dataset

Abstract

Selecting application scenarios matching data is important for the automatic speech recognition (ASR) training, but it is difficult to measure the matching degree of the training corpus. This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora. We first use the self-supervised Hubert model to discretize the speech corpora into label sequence and calculate the N-gram probability distribution. Then we calculate the Kullback-Leibler divergence between the N-grams as the SCD. Finally, we can choose the subset which has minimum SCD to the target corpus for annotation and training. Compared to previous data selection method, the SCD data selection method can focus on more acoustic details and guarantee the diversity of the selected set. We evaluate our method on different accents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing