Auditory-Based Data Augmentation for End-to-End Automatic Speech   Recognition

Zehai Tu; Jack Deadman; Ning Ma; Jon Barker

arXiv:2204.04284·eess.AS·April 12, 2022

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

Zehai Tu, Jack Deadman, Ning Ma, Jon Barker

PDF

Open Access

TL;DR

This paper explores using an auditory-inspired model for data augmentation in end-to-end speech recognition, improving robustness by simulating hearing impairments and focusing on perceptually relevant features.

Contribution

It introduces a novel auditory-based augmentation method that enhances end-to-end speech recognition by simulating hearing abilities, improving robustness over traditional methods.

Findings

01

Significant performance improvement over SpecAugment.

02

Auditory model with spectral smearing and loudness recruitment enhances robustness.

03

Statistically significant gains on LibriSpeech dataset.

Abstract

End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired front-ends have also demonstrated improvement for automatic speech recognisers. In this work, a well-verified auditory-based model, which can simulate various hearing abilities, is investigated for the purpose of data augmentation for end-to-end speech recognition. By introducing the auditory model into the data augmentation process, end-to-end systems are encouraged to ignore variation from the signal that cannot be heard and thereby focus on robust features for speech recognition. Two mechanisms in the auditory model, spectral smearing and loudness recruitment, are studied on the LibriSpeech dataset with a transformer-based end-to-end model. The results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing