Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding
Ke Zou, Yang Bai, Bo Liu, Yidi Chen, Zhihao Chen, Yang Zhou, Xuedong Yuan, Meng Wang, Xiaojing Shen, Xiaochun Cao, Yih Chung Tham, Huazhu Fu

TL;DR
This paper introduces uMedGround, a novel uncertainty-aware framework for end-to-end medical report grounding that improves diagnostic phrase detection and grounding in medical images, outperforming existing methods and supporting clinical decision-making.
Contribution
The paper pioneers the Medical Report Grounding task and proposes uMedGround, integrating uncertainty estimation and multimodal large language models for robust phrase detection and grounding.
Findings
uMedGround outperforms state-of-the-art methods in medical phrase grounding.
The framework effectively integrates uncertainty estimation for reliable predictions.
Demonstrates applicability in medical visual question answering and class localization.
Abstract
Medical phrase grounding is crucial for identifying relevant regions in medical images based on phrase queries, facilitating accurate image analysis and diagnosis. However, current methods rely on manual extraction of key phrases from medical reports, reducing efficiency and increasing the workload for clinicians. Additionally, the lack of model confidence estimation limits clinical trust and usability. In this paper, we introduce a novel task called Medical Report Grounding (MRG), which aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner. To address this challenge, we propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases by embedding a unique token, <BOX>, into the vocabulary to enhance detection capabilities. A vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
