ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal   Data

Xuecheng Wu; Mengmeng Tian; Lanhang Zhai

arXiv:2208.11346·cs.CV·December 10, 2024·1 cites

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data

Xuecheng Wu, Mengmeng Tian, Lanhang Zhai

PDF

Open Access

TL;DR

ICANet is a multimodal approach for short video emotion recognition that combines audio, video, and optical flow data, significantly improving accuracy over single modality methods.

Contribution

The paper introduces ICANet, a novel multimodal framework that enhances emotion recognition accuracy in short videos by integrating three different data modalities.

Findings

01

Achieved 80.77% accuracy on IEMOCAP benchmark.

02

Outperformed state-of-the-art methods by 15.89%.

03

Demonstrated effectiveness of multimodal data fusion in emotion recognition.

Abstract

With the fast development of artificial intelligence and short videos, emotion recognition in short videos has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the accuracy of single modal emotion recognition is relatively terrible. Moreover, it is not easy to distinguish similar emotions. Therefore, we propose a new approach denoted as ICANet to achieve multimodal short video emotion recognition by employing three different modalities of audio, video and optical flow, making up for the lack of a single modality and then improving the accuracy of emotion recognition in short videos. ICANet has a better accuracy of 80.77% on the IEMOCAP benchmark, exceeding the SOTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition