Face-to-Face Contrastive Learning for Social Intelligence Question-Answering
Alex Wilf, Martin Q. Ma, Paul Pu Liang, Amir Zadeh, Louis-Philippe, Morency

TL;DR
This paper introduces Face-to-Face Contrastive Learning (F2F-CL), a graph neural network that models social interactions in videos by capturing face-to-face dynamics across speaking turns, achieving state-of-the-art results on social IQ datasets.
Contribution
The paper presents a novel graph neural network architecture with contrastive learning for modeling complex face-to-face social interactions in multimodal videos.
Findings
Achieved state-of-the-art performance on Social-IQ dataset.
Effectively models conversational dynamics across speaking turns.
Demonstrates the utility of contrastive learning in social interaction modeling.
Abstract
Creating artificial social intelligence - algorithms that can understand the nuances of multi-person interactions - is an exciting and emerging challenge in processing facial expressions and gestures from multimodal videos. Recent multimodal methods have set the state of the art on many tasks, but have difficulty modeling the complex face-to-face conversational dynamics across speaking turns in social interaction, particularly in a self-supervised setup. In this paper, we propose Face-to-Face Contrastive Learning (F2F-CL), a graph neural network designed to model social interactions using factorization nodes to contextualize the multimodal face-to-face interaction along the boundaries of the speaking turn. With the F2F-CL model, we propose to perform contrastive learning between the factorization nodes of different speaking turns within the same video. We experimentally evaluated the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Emotion and Mood Recognition · Human Pose and Action Recognition
MethodsGraph Neural Network · Contrastive Learning
