SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech   Emotion Recognition

Alaa Nfissi; Wassim Bouachir; Nizar Bouguila; Brian Mishara

arXiv:2502.00310·cs.SD·February 4, 2025

SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition

Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

PDF

1 Repo

TL;DR

SigWavNet introduces an end-to-end deep learning framework that combines wavelet transforms with neural networks to improve speech emotion recognition by capturing multi-resolution features directly from raw speech signals.

Contribution

The paper presents a novel wavelet-based deep learning model that learns wavelet bases and denoising jointly, enhancing feature extraction for speech emotion recognition without pre-processing.

Findings

01

Outperforms state-of-the-art on IEMOCAP and EMO-DB datasets.

02

Effectively handles variable-length speech signals.

03

Eliminates need for pre or post-processing steps.

Abstract

In the field of human-computer interaction and psychological assessment, speech emotion recognition (SER) plays an important role in deciphering emotional states from speech signals. Despite advancements, challenges persist due to system complexity, feature distinctiveness issues, and noise interference. This paper introduces a new end-to-end (E2E) deep learning multi-resolution framework for SER, addressing these limitations by extracting meaningful representations directly from raw waveform speech signals. By leveraging the properties of the fast discrete wavelet transform (FDWT), including the cascade algorithm, conjugate quadrature filter, and coefficient denoising, our approach introduces a learnable model for both wavelet bases and denoising through deep learning techniques. The framework incorporates an activation function for learnable asymmetric hard thresholding of wavelet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alaanfissi/sigwavnet-learning-multiresolution-signal-wavelet-network-for-speech-emotion-recognition
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need