Context-aware Cascade Attention-based RNN for Video Emotion Recognition
Man-Chin Sun, Shih-Huan Hsu, Min-Chun Yang, Jen-Hsien Chien

TL;DR
This paper introduces CACA-RNN, a novel cascade RNN architecture that integrates context and facial cues for improved video emotion recognition, demonstrating significant performance gains over baseline models.
Contribution
The paper presents a new cascade RNN model that effectively combines context and facial features for enhanced emotion recognition in videos.
Findings
CACA-RNN achieved 45.51% mAP on MEC2017 test set.
The model outperformed the MEC2017 baseline with a 23.81% increase in mAP.
Incorporating context information improves emotion classification accuracy.
Abstract
Emotion recognition can provide crucial information about the user in many applications when building human-computer interaction (HCI) systems. Most of current researches on visual emotion recognition are focusing on exploring facial features. However, context information including surrounding environment and human body can also provide extra clues to recognize emotion more accurately. Inspired by "sequence to sequence model" for neural machine translation, which models input and output sequences by an encoder and a decoder in recurrent neural network (RNN) architecture respectively, a novel architecture, "CACA-RNN", is proposed in this work. The proposed network consists of two RNNs in a cascaded architecture to process both context and facial information to perform video emotion classification. Results of the model were submitted to video emotion recognition sub-challenge in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
