Auditing Disability Representation in Vision-Language Models
Srikant Panda, Sourabh Singh Yadav, Palkesh Malviya

TL;DR
This paper evaluates how vision-language models handle disability-related descriptions, revealing that contextual prompts often lead to interpretive shifts and biases, but targeted fine-tuning can improve fidelity.
Contribution
Introduces a benchmark for assessing disability representation in VLMs and proposes methods to mitigate interpretive shifts and biases.
Findings
Disability context degrades interpretive fidelity in VLMs.
Biases are amplified along race and gender dimensions.
Fine-tuning improves interpretive fidelity and reduces shifts.
Abstract
Vision-language models (VLMs) are increasingly deployed in socially sensitive applications, yet their behavior with respect to disability remains underexplored. We study disability aware descriptions for person centric images, where models often transition from evidence grounded factual description to interpretation shift including introduction of unsupported inferences beyond observable visual evidence. To systematically analyze this phenomenon, we introduce a benchmark based on paired Neutral Prompts (NP) and Disability-Contextualised Prompts (DP) and evaluate 15 state-of-the-art open- and closed-source VLMs under a zero-shot setting across 9 disability categories. Our evaluation framework treats interpretive fidelity as core objective and combines standard text-based metrics capturing affective degradation through shifts in sentiment, social regard and response length with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Social Robot Interaction and HRI
