Gems: Group Emotion Profiling Through Multimodal Situational Understanding
Anubhav Kataria, Surbhi Madan, Shreya Ghosh, Tom Gedeon, Abhinav Dhall

TL;DR
GEMS introduces a multimodal transformer-based framework for holistic emotion profiling at individual, group, and event levels, advancing multi-person social situation analysis with fine-grained emotion prediction.
Contribution
It proposes a novel multimodal swin-transformer architecture, GEMS, and extends existing datasets with VGAF-GEMS for more detailed emotion analysis.
Findings
GEMS outperforms state-of-the-art models on VGAF-GEMS benchmark.
The framework effectively predicts discrete and continuous emotions at multiple social levels.
Holistic emotion understanding is improved through multimodal data integration.
Abstract
Understanding individual, group and event level emotions along with contextual information is crucial for analyzing a multi-person social situation. To achieve this, we frame emotion comprehension as the task of predicting fine-grained individual emotion to coarse grained group and event level emotion. We introduce GEMS that leverages a multimodal swin-transformer and S3Attention based architecture, which processes an input scene, group members, and context information to generate joint predictions. Existing multi-person emotion related benchmarks mainly focus on atomic interactions primarily based on emotion perception over time and group level. To this end, we extend and propose VGAF-GEMS to provide more fine grained and holistic analysis on top of existing group level annotation of VGAF dataset. GEMS aims to predict basic discrete and continuous emotions (including valence and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
