How you feelin'? Learning Emotions and Mental States in Movie Scenes
Dhruv Srivastava, Aditya Kumar Singh, Makarand Tapaswi

TL;DR
This paper introduces EmoTx, a multimodal Transformer model that predicts diverse emotions and mental states in movie scenes by analyzing videos, characters, and dialogue, advancing understanding of character psychology in film analysis.
Contribution
The paper presents EmoTx, a novel multimodal Transformer architecture for joint emotion and mental state prediction in movie scenes, leveraging the MovieGraphs dataset.
Findings
EmoTx outperforms adapted state-of-the-art emotion recognition methods.
Expressive emotions focus on character tokens, while mental states rely on video and dialogue cues.
Ablation studies confirm the effectiveness of multimodal inputs.
Abstract
Movie story analysis requires understanding characters' emotions and mental states. Towards this goal, we formulate emotion understanding as predicting a diverse and multi-label set of emotions at the level of a movie scene and for each character. We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. By leveraging annotations from the MovieGraphs dataset, we aim to predict classic emotions (e.g. happy, angry) and other mental states (e.g. honest, helpful). We conduct experiments on the most frequently occurring 10 and 25 labels, and a mapping that clusters 181 labels to 26. Ablation studies and comparison against adapted state-of-the-art emotion recognition approaches shows the effectiveness of EmoTx. Analyzing EmoTx's self-attention scores reveals that expressive emotions often look at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Human Pose and Action Recognition · Emotion and Mood Recognition
