DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Shuaiqi Chen; Xiaofen Xing; Weibin Zhang; Weidong Chen; Xiangmin Xu

arXiv:2303.01694·cs.SD·March 6, 2023·1 cites

DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Shuaiqi Chen, Xiaofen Xing, Weibin Zhang, Weidong Chen, Xiangmin Xu

PDF

Open Access 1 Repo

TL;DR

DWFormer is a novel transformer-based model that dynamically segments speech into windows to better capture emotion-related temporal features at multiple scales, improving speech emotion recognition accuracy.

Contribution

It introduces a dynamic windowing mechanism within a transformer architecture to enhance local and global temporal feature extraction for speech emotion recognition.

Findings

01

Outperforms previous state-of-the-art methods on IEMOCAP and MELD datasets.

02

Effectively captures multi-scale temporal information for emotion recognition.

03

Demonstrates improved accuracy over existing models.

Abstract

Speech emotion recognition is crucial to human-computer interaction. The temporal regions that represent different emotions scatter in different parts of the speech locally. Moreover, the temporal scales of important information may vary over a large range within and across speech segments. Although transformer-based models have made progress in this field, the existing models could not precisely locate important regions at different temporal scales. To address the issue, we propose Dynamic Window transFormer (DWFormer), a new architecture that leverages temporal importance by dynamically splitting samples into windows. Self-attention mechanism is applied within windows for capturing temporal important information locally in a fine-grained way. Cross-window information interaction is also taken into account for global communication. DWFormer is evaluated on both the IEMOCAP and the MELD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scutcsq/dwformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing