Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Ta Duc Huy, Duy Anh Huynh, Yutong Xie, Yuankai Qi, Qi Chen, Phi Le Nguyen, Sen Kim Tran, Son Lam Phung, Anton van den Hengel, Zhibin Liao, Minh-Son To, Johan W. Verjans, Vu Minh Hieu Phan

TL;DR
This paper improves medical visual grounding by introducing Disease-Aware Prompting, which enhances focus on disease regions in images, leading to significant accuracy gains without extra annotations.
Contribution
The paper proposes Disease-Aware Prompting (DAP), a novel method that leverages explainability maps to improve disease region identification in medical images.
Findings
DAP improves grounding accuracy by 20.74% over state-of-the-art methods.
Current VLMs focus on background tokens, reducing disease region detection.
Global tokens are not representative of local disease features.
Abstract
Visual grounding (VG) is the capability to identify the specific regions in an image associated with a particular text description. In medical imaging, VG enhances interpretability by highlighting relevant pathological features corresponding to textual descriptions, improving model transparency and trustworthiness for wider adoption of deep learning models in clinical practice. Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations. In this paper, we empirically demonstrate two key observations. First, current VLMs assign high norms to background tokens, diverting the model's attention from regions of disease. Second, the global tokens used for cross-modal learning are not representative of local disease tokens. This hampers identifying correlations between the text and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTelemedicine and Telehealth Implementation · Empathy and Medical Education · AI in cancer detection
MethodsSoftmax · Attention Is All You Need
