Noise-Resistant Multimodal Transformer for Emotion Recognition

Yuanyuan Liu; Haoyu Zhang; Yibing Zhan; Zijing Chen; Guanghao Yin; Lin; Wei; Zhe Chen

arXiv:2305.02814·cs.MM·May 5, 2023·5 cites

Noise-Resistant Multimodal Transformer for Emotion Recognition

Yuanyuan Liu, Haoyu Zhang, Yibing Zhan, Zijing Chen, Guanghao Yin, Lin, Wei, Zhe Chen

PDF

Open Access

TL;DR

This paper introduces NORM-TR, a noise-resistant multimodal transformer that enhances emotion recognition accuracy by extracting disturbance-insensitive features and employing a noise-aware training scheme, outperforming existing methods.

Contribution

The paper proposes a novel noise-resistant feature extractor and a noise-aware learning scheme for multimodal emotion recognition, improving robustness against noisy data.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Significantly outperforms existing methods in noisy conditions.

03

Demonstrates the importance of noise resistance in emotion recognition.

Abstract

Multimodal emotion recognition identifies human emotions from various data modalities like video, text, and audio. However, we found that this task can be easily affected by noisy information that does not contain useful semantics. To this end, we present a novel paradigm that attempts to extract noise-resistant features in its pipeline and introduces a noise-aware learning scheme to effectively improve the robustness of multimodal emotion understanding. Our new pipeline, namely Noise-Resistant Multimodal Transformer (NORM-TR), mainly introduces a Noise-Resistant Generic Feature (NRGF) extractor and a Transformer for the multimodal emotion recognition task. In particular, we make the NRGF extractor learn a generic and disturbance-insensitive representation so that consistent and meaningful semantics can be obtained. Furthermore, we apply a Transformer to incorporate Multimodal Features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech and Audio Processing

MethodsAttention Is All You Need · Adam · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Absolute Position Encodings