Using Scene and Semantic Features for Multi-modal Emotion Recognition

Zhifeng Wang; Ramesh Sankaranarayana

arXiv:2308.00228·cs.CV·August 2, 2023

Using Scene and Semantic Features for Multi-modal Emotion Recognition

Zhifeng Wang, Ramesh Sankaranarayana

PDF

Open Access

TL;DR

This paper introduces a multi-modal emotion recognition approach combining scene, semantic, and personal features, utilizing a modified EmbraceNet to enhance accuracy and robustness, especially with incomplete data, demonstrated on the EMOTIC dataset.

Contribution

It proposes integrating scene and semantic features with personal data and a modified EmbraceNet for improved emotion recognition accuracy.

Findings

01

Achieved 40.39% average precision on EMOTIC dataset.

02

Improved robustness with partially missing data.

03

Outperformed previous methods by 5% in accuracy.

Abstract

Automatic emotion recognition is a hot topic with a wide range of applications. Much work has been done in the area of automatic emotion recognition in recent years. The focus has been mainly on using the characteristics of a person such as speech, facial expression and pose for this purpose. However, the processing of scene and semantic features for emotion recognition has had limited exploration. In this paper, we propose to use combined scene and semantic features, along with personal features, for multi-modal emotion recognition. Scene features will describe the environment or context in which the target person is operating. The semantic feature can include objects that are present in the environment, as well as their attributes and relationships with the target person. In addition, we use a modified EmbraceNet to extract features from the images, which is trained to learn both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition

MethodsEmbraceNet: A robust deep learning architecture for multimodal classification · Focus