TRNet: Two-level Refinement Network leveraging Speech Enhancement for   Noise Robust Speech Emotion Recognition

Chengxin Chen; Pengyuan Zhang

arXiv:2404.12979·cs.SD·September 4, 2024

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

Chengxin Chen, Pengyuan Zhang

PDF

TL;DR

TRNet is a novel two-level refinement network that uses speech enhancement techniques to improve noise robustness in speech emotion recognition, maintaining performance across noisy and noise-free conditions.

Contribution

Introduces TRNet, a two-level refinement approach combining speech enhancement and deep representation refinement for robust SER in noisy environments.

Findings

01

Significantly improves SER accuracy in noisy conditions

02

Maintains performance in noise-free environments

03

Effective in both matched and unmatched noise scenarios

Abstract

One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in deteriorating SER performance in practice. In this paper, we introduce a Two-level Refinement Network, dubbed TRNet, to address this challenge. Specifically, a pre-trained speech enhancement module is employed for front-end noise reduction and noise level estimation. Later, we utilize clean speech spectrograms and their corresponding deep representations as reference signals to refine the spectrogram distortion and representation shift of enhanced speech during model training. Experimental results validate that the proposed TRNet substantially promotes the robustness of the proposed system in both matched and unmatched noisy environments, without compromising its performance in noise-free environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.