Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
Purvi Agrawal, Sriram Ganapathy

TL;DR
This paper introduces a relevance weighted neural network framework for raw waveform speech recognition that adaptively selects features, significantly improving robustness in noisy and reverberant environments.
Contribution
It proposes a novel relevance weighting mechanism with sub-networks for feature selection in raw waveform acoustic models, enhancing noise robustness.
Findings
Achieved an average of 10% relative WER reduction across datasets.
Improved robustness in noisy and reverberant conditions.
Demonstrated effectiveness of relevance weighting in raw waveform models.
Abstract
Speech recognition in noisy and channel distorted scenarios is often challenging as the current acoustic modeling schemes are not adaptive to the changes in the signal distribution in the presence of noise. In this work, we develop a novel acoustic modeling framework for noise robust speech recognition based on relevance weighting mechanism. The relevance weighting is achieved using a sub-network approach that performs feature selection. A relevance sub-network is applied on the output of first layer of a convolutional network model operating on raw speech signals while a second relevance sub-network is applied on the second convolutional layer output. The relevance weights for the first layer correspond to an acoustic filterbank selection while the relevance weights in the second layer perform modulation filter selection. The model is trained for a speech recognition task on noisy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
