Attention-based Region of Interest (ROI) Detection for Speech Emotion Recognition
Jay Desai, Houwei Cao, Ravi Shah

TL;DR
This paper introduces an attention-based deep learning approach to detect emotionally salient regions in speech and video, improving emotion recognition accuracy by focusing on key regions within utterances.
Contribution
It proposes a novel attention mechanism within recurrent neural networks to identify and leverage emotionally salient regions for better emotion recognition.
Findings
Attention models outperform LSTM baselines in emotion classification.
Emotionally salient regions correlate with specific emotional expressions.
Attention weights provide interpretability of emotional content.
Abstract
Automatic emotion recognition for real-life appli-cations is a challenging task. Human emotion expressions aresubtle, and can be conveyed by a combination of several emo-tions. In most existing emotion recognition studies, each audioutterance/video clip is labelled/classified in its entirety. However,utterance/clip-level labelling and classification can be too coarseto capture the subtle intra-utterance/clip temporal dynamics. Forexample, an utterance/video clip usually contains only a fewemotion-salient regions and many emotionless regions. In thisstudy, we propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video,and further estimate the temporal emotion dynamics by aggre-gating those emotionally salient regions-of-interest. We comparethe ROI from audio and video and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Advanced Data Compression Techniques
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
