On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

David Restrepo; Ira Ktena; Maria Vakalopoulou; Stergios Christodoulidis; Enzo Ferrante

arXiv:2508.00171·cs.CV·August 4, 2025

On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante

PDF

Open Access

TL;DR

This paper introduces a perturbation-based method called Selective Modality Shifting to diagnose and quantify biases in multimodal clinical AI models, revealing a tendency to over-rely on textual information over visual cues.

Contribution

The work presents a novel approach to systematically assess modality reliance in Vision-Language Models for medical data, highlighting the need for genuine multimodal integration.

Findings

01

Models predominantly depend on text over images.

02

Bias towards textual modality persists despite visual information.

03

Qualitative analysis confirms overshadowing of image content.

Abstract

Clinical decision-making relies on the integrated analysis of medical images and the associated clinical reports. While Vision-Language Models (VLMs) can offer a unified framework for such tasks, they can exhibit strong biases toward one modality, frequently overlooking critical visual cues in favor of textual information. In this work, we introduce Selective Modality Shifting (SMS), a perturbation-based approach to quantify a model's reliance on each modality in binary classification tasks. By systematically swapping images or text between samples with opposing labels, we expose modality-specific biases. We assess six open-source VLMs-four generalist models and two fine-tuned for medical data-on two medical imaging datasets with distinct modalities: MIMIC-CXR (chest X-ray) and FairVLMed (scanning laser ophthalmoscopy). By assessing model performance and the calibration of every model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning