Speech Recognition Front End Without Information Loss

Matthew Ager; Zoran Cvetkovic; Peter Sollich

arXiv:1312.6849·cs.CL·March 31, 2015·1 cites

Speech Recognition Front End Without Information Loss

Matthew Ager, Zoran Cvetkovic, Peter Sollich

PDF

Open Access

TL;DR

This paper proposes a high-dimensional linear feature framework for speech recognition that preserves more acoustic information and improves robustness to noise, outperforming traditional features at low SNR levels.

Contribution

It introduces a generative phoneme modeling approach in high-dimensional linear feature spaces, enhancing noise robustness in speech recognition.

Findings

01

Better performance than PLP and MFCC classifiers below 18 dB SNR.

02

High-dimensional features combined with MFCC improve recognition across all noise levels.

03

Linear feature domains allow for exact noise adaptation.

Abstract

Speech representation and modelling in high-dimensional spaces of acoustic waveforms, or a linear transformation thereof, is investigated with the aim of improving the robustness of automatic speech recognition to additive noise. The motivation behind this approach is twofold: (i) the information in acoustic waveforms that is usually removed in the process of extracting low-dimensional features might aid robust recognition by virtue of structured redundancy analogous to channel coding, (ii) linear feature domains allow for exact noise adaptation, as opposed to representations that involve non-linear processing which makes noise adaptation challenging. Thus, we develop a generative framework for phoneme modelling in high-dimensional linear feature domains, and use it in phoneme classification and recognition tasks. Results show that classification and recognition in this framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing