Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
Zehai Tu, Jack Deadman, Ning Ma, Jon Barker

TL;DR
This paper explores using an auditory-inspired model for data augmentation in end-to-end speech recognition, improving robustness by simulating hearing impairments and focusing on perceptually relevant features.
Contribution
It introduces a novel auditory-based augmentation method that enhances end-to-end speech recognition by simulating hearing abilities, improving robustness over traditional methods.
Findings
Significant performance improvement over SpecAugment.
Auditory model with spectral smearing and loudness recruitment enhances robustness.
Statistically significant gains on LibriSpeech dataset.
Abstract
End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired front-ends have also demonstrated improvement for automatic speech recognisers. In this work, a well-verified auditory-based model, which can simulate various hearing abilities, is investigated for the purpose of data augmentation for end-to-end speech recognition. By introducing the auditory model into the data augmentation process, end-to-end systems are encouraged to ignore variation from the signal that cannot be heard and thereby focus on robust features for speech recognition. Two mechanisms in the auditory model, spectral smearing and loudness recruitment, are studied on the LibriSpeech dataset with a transformer-based end-to-end model. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
