A Deep Network for Arousal-Valence Emotion Prediction with   Acoustic-Visual Cues

Songyou Peng; Le Zhang; Yutong Ban; Meng Fang; Stefan Winkler

arXiv:1805.00638·cs.CV·June 26, 2019·23 cites

A Deep Network for Arousal-Valence Emotion Prediction with Acoustic-Visual Cues

Songyou Peng, Le Zhang, Yutong Ban, Meng Fang, Stefan Winkler

PDF

Open Access 1 Repo

TL;DR

This paper presents a deep learning approach that integrates acoustic and visual cues to predict arousal and valence emotions, aiming to improve emotion recognition accuracy.

Contribution

It introduces a novel deep network architecture specifically designed for multimodal emotion prediction using acoustic and visual data.

Findings

01

Achieved competitive results in the 2018 Emotion Behavior Challenge.

02

Demonstrated the effectiveness of multimodal cues in emotion prediction.

03

Provided a detailed methodology for emotion recognition using deep learning.

Abstract

In this paper, we comprehensively describe the methodology of our submissions to the One-Minute Gradual-Emotion Behavior Challenge 2018.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pengsongyou/OMG-ADSC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Video Surveillance and Tracking Methods