Phonetic Error Analysis of Raw Waveform Acoustic Models with Parametric and Non-Parametric CNNs
Erfan Loweimi, Andrea Carmantini, Peter Bell, Steve Renals, Zoran, Cvetkovic

TL;DR
This study analyzes phonetic error patterns in raw waveform acoustic models using CNNs and LSTMs, revealing detailed error categorization and confusion patterns, and demonstrates improved performance with transfer learning on the TIMIT dataset.
Contribution
It provides a detailed phonetic error analysis of raw waveform models, including new categorization and comparison with other systems, and shows performance gains with transfer learning.
Findings
Achieved 13.7%/15.2% PERs on TIMIT, outperforming previous raw waveform models.
Transfer learning reduces PER to 11.8%/13.7%, improving accuracy.
Confusion patterns differ across phonetic categories and are influenced by model type.
Abstract
In this paper, we analyse the error patterns of the raw waveform acoustic models in TIMIT's phone recognition task. Our analysis goes beyond the conventional phone error rate (PER) metric. We categorise the phones into three groups: {affricate, diphthong, fricative, nasal, plosive, semi-vowel, vowel, silence}, {consonant, vowel+, silence}, and {voiced, unvoiced, silence} and, compute the PER for each broad phonetic class in each category. We also construct a confusion matrix for each category using the substitution errors and compare the confusion patterns with those of the Filterbank and Wav2vec 2.0 systems. Our raw waveform acoustic models consists of parametric (Sinc2Net) or non-parametric CNNs and Bidirectional LSTMs, achieving down to 13.7%/15.2% PERs on TIMIT Dev/Test sets, outperforming reported PERs for raw waveform models in the literature. We also investigate the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUltrasonics and Acoustic Wave Propagation · Speech and Audio Processing · Underwater Acoustics Research
