Does Speech enhancement of publicly available data help build robust   Speech Recognition Systems?

Bhavya Ghai; Buvana Ramanan; Klaus Mueller

arXiv:1910.13488·eess.AS·November 21, 2019·1 cites

Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Bhavya Ghai, Buvana Ramanan, Klaus Mueller

PDF

Open Access

TL;DR

This paper investigates whether speech enhancement of publicly available noisy data can improve the robustness of speech recognition systems, showing significant WER improvements and comparable performance to ideal training scenarios.

Contribution

It demonstrates that using speech enhancement on publicly available noisy data enhances ASR robustness and achieves results close to ideal training conditions.

Findings

01

Speech enhancement improves WER by 9.5% over noisy data

02

Enhanced data performs comparably to training on clean and noisy data combined

03

Publicly available data can be effectively used for robust ASR training

Abstract

Automatic speech recognition (ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of clean speech data for training which gives an undue advantage to large organizations which have tons of private data. In this paper, we have first curated a fairly big dataset using publicly available data sources. Thereafter, we tried to investigate if we can use publicly available noisy data to train robust ASR systems. We have used speech enhancement to clean the noisy data first and then used it together with its cleaned version to train ASR systems. We have found that using speech enhancement gives 9.5\% better word error rate than training on just noisy data and 9\% better than training on just clean data. It's performance is also comparable to the ideal case scenario when trained on noisy and its clean version.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing