BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation

Huanyang Tong; Kai Liu; Fangjun Kuang; Huiling Chen

arXiv:2605.15736·cs.CV·May 18, 2026

BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation

Huanyang Tong, Kai Liu, Fangjun Kuang, Huiling Chen

PDF

1 Repo

TL;DR

BiomedAP introduces a dual-anchored, vision-informed framework with gated cross-modal fusion to improve robustness and accuracy in medical vision-language tasks, especially under prompt variations.

Contribution

It proposes a novel dual-anchor and gated fusion approach that enhances cross-modal alignment and stability in biomedical vision-language models.

Findings

01

Outperforms baselines across 11 benchmarks.

02

Achieves robust few-shot accuracy.

03

Significantly improves stability under prompt perturbations.

Abstract

Biomedical Vision--Language Models (VLMs) have shown remarkable promise in few-shot medical diagnosis but face a critical bottleneck: \textit{fragility to prompt variations}.Existing adaptation frameworks typically optimize visual and textual prompts as independent streams, relying on ideal ``Golden Prompts''. In clinical reality, where descriptions are often noisy and heterogeneous, this modality isolation leads to unstable cross-modal alignment. To address this, we propose BiomedAP, a vision-informed dual-anchor framework with gated cross-modal fusion.BiomedAP enforces synergistic alignment through two mechanisms: (1) Gated Cross-Modal Fusion, which enables layer-wise interaction between modalities, acting as a dynamic noise regulator to suppress irrelevant textual cues; and (2) a Dual-Anchor Constraint that regularizes learnable prompts toward stable semantic centroids derived from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tongdiedie/BiomedAP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.