From a Social Cognitive Perspective: Context-aware Visual Social   Relationship Recognition

Shiwei Wu; Chao Zhang; Joya Chen; Tong Xu; Likang Wu; Yao Hu; Enhong; Chen

arXiv:2406.08358·cs.CV·June 13, 2024

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

Shiwei Wu, Chao Zhang, Joya Chen, Tong Xu, Likang Wu, Yao Hu, Enhong, Chen

PDF

Open Access

TL;DR

This paper introduces ConSoR, a novel social relationship recognition method that leverages social context and visual-linguistic contrastive learning, significantly improving accuracy over previous approaches.

Contribution

The paper proposes a social cognitive-inspired approach using a multi-modal adapter and descriptive prompts to enhance social relationship understanding from images.

Findings

01

Achieves 12.2% higher accuracy on PISC dataset

02

Attains 9.8% improvement on PIPA benchmark

03

Effectively identifies critical visual cues for social relationships

Abstract

People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods of social relationship understanding rely on the basic classification paradigm of detected persons and objects, which fails to understand the comprehensive context and often overlooks decisive social factors, especially subtle visual cues. To highlight the social-aware context and intricate details, we propose a novel approach that recognizes \textbf{Con}textual \textbf{So}cial \textbf{R}elationships (\textbf{ConSoR}) from a social cognitive perspective. Specifically, to incorporate social-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Video Surveillance and Tracking Methods

MethodsAdapter · Contrastive Language-Image Pre-training