Toward Robust Medical Fairness: Debiased Dual-Modal Alignment via Text-Guided Attribute-Disentangled Prompt Learning for Vision-Language Models

Yuexuan Xia; Benteng Ma; Jiang He; Zhiyong Wang; Qi Dou; Yong Xia

arXiv:2508.18886·cs.CV·August 27, 2025

Toward Robust Medical Fairness: Debiased Dual-Modal Alignment via Text-Guided Attribute-Disentangled Prompt Learning for Vision-Language Models

Yuexuan Xia, Benteng Ma, Jiang He, Zhiyong Wang, Qi Dou, Yong Xia

PDF

TL;DR

This paper introduces DualFairVL, a novel multimodal prompt-learning framework that jointly debiases and aligns vision-language representations to improve fairness and robustness in medical diagnosis across diverse datasets.

Contribution

The paper proposes a dual-branch architecture with disentangled sensitive and target attributes, and a hypernetwork for instance-aware prompts, advancing fairness in medical vision-language models.

Findings

01

Achieves state-of-the-art fairness and accuracy on eight datasets.

02

Outperforms full fine-tuning and parameter-efficient baselines.

03

Operates effectively with only 3.6 million trainable parameters.

Abstract

Ensuring fairness across demographic groups in medical diagnosis is essential for equitable healthcare, particularly under distribution shifts caused by variations in imaging equipment and clinical practice. Vision-language models (VLMs) exhibit strong generalization, and text prompts encode identity attributes, enabling explicit identification and removal of sensitive directions. However, existing debiasing approaches typically address vision and text modalities independently, leaving residual cross-modal misalignment and fairness gaps. To address this challenge, we propose DualFairVL, a multimodal prompt-learning framework that jointly debiases and aligns cross-modal representations. DualFairVL employs a parallel dual-branch architecture that separates sensitive and target attributes, enabling disentangled yet aligned representations across modalities. Approximately orthogonal text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.