Quartered Spectral Envelope and 1D-CNN-based Classification of Normally   Phonated and Whispered Speech

S. Johanan Joysingh; P. Vijayalakshmi; T. Nagarajan

arXiv:2408.13746·eess.AS·August 27, 2024

Quartered Spectral Envelope and 1D-CNN-based Classification of Normally Phonated and Whispered Speech

S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

PDF

TL;DR

This paper introduces a high-accuracy, low-overhead 1D-CNN based system for classifying whispered and normal speech using spectral envelope features, addressing a gap in speech applications for whispered speech.

Contribution

It proposes a novel quartered spectral envelope feature and demonstrates its effectiveness with 1D-CNN for whisper classification, outperforming or matching state-of-the-art methods.

Findings

01

Achieved 99.31% accuracy on wTIMIT dataset

02

Achieved 100% accuracy on CHAINS dataset

03

System is robust under white noise conditions

Abstract

Whisper, as a form of speech, is not sufficiently addressed by mainstream speech applications. This is due to the fact that systems built for normal speech do not work as expected for whispered speech. A first step to building a speech application that is inclusive of whispered speech, is the successful classification of whispered speech and normal speech. Such a front-end classification system is expected to have high accuracy and low computational overhead, which is the scope of this paper. One of the characteristics of whispered speech is the absence of the fundamental frequency (or pitch), and hence the pitch harmonics as well. The presence of the pitch and pitch harmonics in normal speech, and its absence in whispered speech, is evident in the spectral envelope of the Fourier transform. We observe that this characteristic is predominant in the first quarter of the spectrum, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.