Synchronous Prediction of Arousal and Valence Using LSTM Network for Affective Video Content Analysis
Ligang Zhang, Jiulong Zhang

TL;DR
This paper introduces a novel LSTM-based method for simultaneously predicting arousal and valence in videos, leveraging their inherent correlation to improve affective content analysis accuracy.
Contribution
It presents the first approach to jointly predict arousal and valence using LSTM, exploiting their correlations for enhanced affective video content analysis.
Findings
LSTM-based approach outperforms traditional SVM methods
Joint prediction improves accuracy of affective dimension estimation
Correlations between affective dimensions enhance prediction performance
Abstract
The affect embedded in video data conveys high-level semantic information about the content and has direct impact on the understanding and perception of reviewers, as well as their emotional responses. Affective Video Content Analysis (AVCA) attempts to generate a direct mapping between video content and the corresponding affective states such as arousal and valence dimensions. Most existing studies establish the mapping for each dimension separately using knowledge-based rules or traditional classifiers such as Support Vector Machine (SVM). The inherent correlations between affective dimensions have largely been unexploited, which are anticipated to include important information for accurate prediction of affective dimensions. To address this issue, this paper presents an approach to predict arousal and valance dimensions synchronously using the Long Short Term Memory (LSTM) network.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Analysis and Summarization · Media Influence and Health
