Non-linear frequency warping using constant-Q transformation for speech   emotion recognition

Premjeet Singh; Goutam Saha; Md Sahidullah

arXiv:2102.04029·eess.AS·February 9, 2021

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

Premjeet Singh, Goutam Saha, Md Sahidullah

PDF

TL;DR

This paper investigates the use of constant-Q transform (CQT) for speech emotion recognition, showing that CQT provides better feature representation and generalization than traditional STFT-based features, especially in low-frequency regions.

Contribution

The study introduces CQT-based features for SER and demonstrates their superior performance and generalization over STFT-based features in deep neural network classifiers.

Findings

01

CQT-based features outperform STFT features in SER accuracy.

02

CQT features offer better generalization across different datasets.

03

Lower-frequency resolution in CQT captures more emotion-related information.

Abstract

In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a comparative analysis of short-term acoustic features based on STFT and CQT for SER with deep neural network (DNN) as a back-end classifier. We optimize different parameters for both features. The CQT-based features outperform the STFT-based spectral features for SER experiments. Further experiments with cross-corpora evaluation demonstrate that the CQT-based systems provide better generalization with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.