Speech Recognition Front End Without Information Loss
Matthew Ager, Zoran Cvetkovic, Peter Sollich

TL;DR
This paper proposes a high-dimensional linear feature framework for speech recognition that preserves more acoustic information and improves robustness to noise, outperforming traditional features at low SNR levels.
Contribution
It introduces a generative phoneme modeling approach in high-dimensional linear feature spaces, enhancing noise robustness in speech recognition.
Findings
Better performance than PLP and MFCC classifiers below 18 dB SNR.
High-dimensional features combined with MFCC improve recognition across all noise levels.
Linear feature domains allow for exact noise adaptation.
Abstract
Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The motivation behind this approach is twofold: (i) the information in acoustic waveforms that is usually removed in the process of extracting low-dimensional features might aid robust recognition by virtue of structured redundancy analogous to channel coding, (ii) linear feature domains allow for exact noise adaptation, as opposed to representations that involve non-linear processing which makes noise adaptation challenging. Thus, we develop a generative framework for phoneme modelling in high-dimensional linear feature domains, and use it in phoneme classification and recognition tasks. Results show that classification and recognition in this framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
