Cause-Effect Driven Optimization for Robust Medical Visual Question Answering with Language Biases

Huanjia Zhu; Yishu Liu; Xiaozhao Fang; Guangming Lu; Bingzhi Chen

arXiv:2506.17903·cs.CV·June 24, 2025

Cause-Effect Driven Optimization for Robust Medical Visual Question Answering with Language Biases

Huanjia Zhu, Yishu Liu, Xiaozhao Fang, Guangming Lu, Bingzhi Chen

PDF

TL;DR

This paper introduces CEDO, a comprehensive framework that reduces language biases in medical visual question answering by employing modality-specific optimization, gradient-based synergy, and adaptive loss rescaling, leading to more robust reasoning.

Contribution

The paper proposes a novel Cause-Effect Driven Optimization framework with three mechanisms to mitigate language biases from causal and effectual perspectives in Med-VQA models.

Findings

01

CEDO outperforms state-of-the-art methods on multiple benchmarks.

02

The framework effectively reduces shortcut and dataset imbalance biases.

03

Extensive experiments validate the robustness of CEDO across various datasets.

Abstract

Existing Medical Visual Question Answering (Med-VQA) models often suffer from language biases, where spurious correlations between question types and answer categories are inadvertently established. To address these issues, we propose a novel Cause-Effect Driven Optimization framework called CEDO, that incorporates three well-established mechanisms, i.e., Modality-driven Heterogeneous Optimization (MHO), Gradient-guided Modality Synergy (GMS), and Distribution-adapted Loss Rescaling (DLR), for comprehensively mitigating language biases from both causal and effectual perspectives. Specifically, MHO employs adaptive learning rates for specific modalities to achieve heterogeneous optimization, thus enhancing robust reasoning capabilities. Additionally, GMS leverages the Pareto optimization method to foster synergistic interactions between modalities and enforce gradient orthogonality to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.