A Deep Network for Arousal-Valence Emotion Prediction with Acoustic-Visual Cues
Songyou Peng, Le Zhang, Yutong Ban, Meng Fang, Stefan Winkler

TL;DR
This paper presents a deep learning approach that integrates acoustic and visual cues to predict arousal and valence emotions, aiming to improve emotion recognition accuracy.
Contribution
It introduces a novel deep network architecture specifically designed for multimodal emotion prediction using acoustic and visual data.
Findings
Achieved competitive results in the 2018 Emotion Behavior Challenge.
Demonstrated the effectiveness of multimodal cues in emotion prediction.
Provided a detailed methodology for emotion recognition using deep learning.
Abstract
In this paper, we comprehensively describe the methodology of our submissions to the One-Minute Gradual-Emotion Behavior Challenge 2018.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Video Surveillance and Tracking Methods
