Blind Signal Dereverberation for Machine Speech Recognition
Samik Sadhu, Hynek Hermansky

TL;DR
This paper introduces a method for dereverberating speech signals by leveraging long-window Fourier transforms and spectral normalization, improving speech recognition in reverberant environments.
Contribution
It proposes a novel dereverberation technique that uses training data and spectral normalization in the log spectral domain to enhance speech recognition accuracy.
Findings
Effective dereverberation in reverberant environments
Improved speech recognition performance
Utilizes spectral normalization for noise reduction
Abstract
We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data. Using Fourier transform computed over long temporal windows, which ideally cover the entire room impulse response, we convert room induced convolution to additions in the log spectral domain. Next, we compute a spectral normalization vector from statistics gathered over reverberated as well as over clean speech in the log spectral domain. During operation, this normalization vectors are used to alleviate reverberations from complex speech spectra recorded under the same reverberant conditions . Such dereverberated complex speech spectra are used to compute complex FDLP-spectrograms for use in automatic speech recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques
MethodsSpectral Normalization · Convolution
