SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models

Hamza Tahboub; Weiyan Shi; Gang Hua; Huaizu Jiang

arXiv:2512.01148·cs.CV·February 3, 2026

SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models

Hamza Tahboub, Weiyan Shi, Gang Hua, Huaizu Jiang

PDF

Open Access

TL;DR

This paper identifies that current vision-language models suffer from social degradation due to pre-training, and proposes SocialFusion, a framework that improves social perception across multiple tasks by minimally connecting a frozen visual encoder with a language model.

Contribution

The paper introduces SocialFusion, a novel framework that mitigates social degradation in VLMs by minimally connecting a frozen visual encoder with a language model, enabling positive transfer across social tasks.

Findings

01

SocialFusion outperforms existing VLMs on five social perception tasks.

02

It achieves performance comparable to task-specific models on benchmarks.

03

Current pre-training strategies may hinder social competence acquisition.

Abstract

Understanding social interactions from visual cues is a fundamental challenge for a socially competent AI. While powerful pre-trained vision-language models (VLMs) have shown remarkable general capabilities, they surprisingly struggle to unify and learn multiple social perception tasks simultaneously, often exhibiting negative transfer. We identify that this negative transfer stems from a critical issue we term "social degradation," whereby the general visual-linguistic pre-training process of VLMs impairs the visual encoder's ability to represent nuanced social information. We investigate this behavior further under two lenses: decodability through linear representation probing and compatibility through gradient conflict analysis, revealing that both play a role in the degradation, especially the former, which is significantly compromised in the VLM pre-training process. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)