FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

Sophie Chiang; Tom Brennan; Fethiye Irmak Dogan; Jiaee Cheong; Hatice Gunes

arXiv:2604.23786·cs.AI·April 28, 2026

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

Sophie Chiang, Tom Brennan, Fethiye Irmak Dogan, Jiaee Cheong, Hatice Gunes

PDF

TL;DR

This paper investigates the fairness and explainability of Vision-Language Models in wellbeing assessment, revealing performance variability, biases, and the complex effects of explainability interventions on fairness.

Contribution

It introduces an explainability framework for VLMs in wellbeing assessment, analyzing biases and fairness trade-offs across datasets and architectures.

Findings

01

Phi3.5-Vision achieved 80.4% accuracy on E-DAIC.

02

Qwen2-VL struggled at 33.9% accuracy and showed higher gender bias.

03

Explainability interventions had mixed effects on fairness and bias.

Abstract

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision-Language Models (VLMs), their deployment in clinical settings has raised concerns due to their lack of transparency and potential for bias. While previous research has explored the intersection of fairness and Explainable AI (XAI), its application to VLMs for wellbeing assessment and depression prediction remains under-explored. This work investigates VLM performance across laboratory (AFAR-BSFT) and naturalistic (E-DAIC) datasets, focusing on diagnostic reliability and demographic fairness. Performance varied substantially across environments and architectures; Phi3.5-Vision achieved 80.4% accuracy on E-DAIC, while Qwen2-VL struggled at 33.9%. Additionally, both models demonstrated a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.