Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning
Samarth Tripathi, Sarthak Tripathi, Homayoon Beigi

TL;DR
This paper presents a multi-modal deep learning approach for emotion recognition on the IEMOCAP dataset, integrating speech, text, and motion capture data to improve accuracy and robustness over prior speech-only methods.
Contribution
It introduces the first multi-modal neural network model for emotion recognition on IEMOCAP, combining diverse data sources for enhanced performance.
Findings
Achieved improved accuracy over speech-only models
Demonstrated effectiveness of multi-modal data fusion
Provided insights into multimodal emotion cues
Abstract
Emotion recognition has become an important field of research in Human Computer Interactions as we improve upon the techniques for modelling the various aspects of behaviour. With the advancement of technology our understanding of emotions are advancing, there is a growing need for automatic emotion recognition systems. One of the directions the research is heading is the use of Neural Networks which are adept at estimating complex functions that depend on a large number and diverse source of input data. In this paper we attempt to exploit this effectiveness of Neural networks to enable us to perform multimodal Emotion recognition on IEMOCAP dataset using data from Speech, Text, and Motion capture data from face expressions, rotation and hand movements. Prior research has concentrated on Emotion detection from Speech on the IEMOCAP dataset, but our approach is the first that uses the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · EEG and Brain-Computer Interfaces
