Analysis of constant-Q filterbank based representations for speech   emotion recognition

Premjeet Singh; Shefali Waldekar; Md Sahidullah; Goutam Saha

arXiv:2211.16363·eess.AS·November 30, 2022

Analysis of constant-Q filterbank based representations for speech emotion recognition

Premjeet Singh, Shefali Waldekar, Md Sahidullah, Goutam Saha

PDF

TL;DR

This paper investigates constant-Q filterbank-based time-frequency representations for speech emotion recognition, demonstrating their advantages in frequency resolution and robustness against pitch variations, leading to improved emotion classification performance.

Contribution

It provides a comprehensive analysis of constant-Q representations for SER, highlighting their benefits over traditional features and validating their effectiveness with deep neural networks.

Findings

01

Constant-Q features offer higher low-frequency resolution.

02

They provide increased robustness against pitch variations.

03

SER performance improves with constant-Q representations.

Abstract

This work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER). Constant-Q filterbank provides non-linear spectro-temporal representation with higher frequency resolution at low frequencies. Our investigation reveals how the increased low-frequency resolution benefits SER. The time-domain comparative analysis between short-term mel-frequency spectral coefficients (MFSCs) and constant-Q filterbank-based features, namely constant-Q transform (CQT) and continuous wavelet transform (CWT), reveals that constant-Q representations provide higher time-invariance at low-frequencies. This provides increased robustness against emotion irrelevant temporal variations in pitch, especially for low-arousal emotions. The corresponding frequency-domain analysis over different emotion classes shows better resolution of pitch harmonics in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.