From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Shiwei Wu, Chao Zhang, Joya Chen, Tong Xu, Likang Wu, Yao Hu, Enhong, Chen

TL;DR
This paper introduces ConSoR, a novel social relationship recognition method that leverages social context and visual-linguistic contrastive learning, significantly improving accuracy over previous approaches.
Contribution
The paper proposes a social cognitive-inspired approach using a multi-modal adapter and descriptive prompts to enhance social relationship understanding from images.
Findings
Achieves 12.2% higher accuracy on PISC dataset
Attains 9.8% improvement on PIPA benchmark
Effectively identifies critical visual cues for social relationships
Abstract
People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods of social relationship understanding rely on the basic classification paradigm of detected persons and objects, which fails to understand the comprehensive context and often overlooks decisive social factors, especially subtle visual cues. To highlight the social-aware context and intricate details, we propose a novel approach that recognizes \textbf{Con}textual \textbf{So}cial \textbf{R}elationships (\textbf{ConSoR}) from a social cognitive perspective. Specifically, to incorporate social-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
MethodsAdapter · Contrastive Language-Image Pre-training
