TL;DR
This paper introduces Dynamic Decision Learning (DDL), a method that enhances rare disease abnormality localization in medical images by refining large vision-language models during inference, improving accuracy and confidence calibration.
Contribution
The paper presents DDL, a novel inference-time refinement framework for large vision-language models, specifically targeting rare disease abnormality grounding with improved accuracy and reliability.
Findings
DDL improves mAP@75 by up to 105% on rare-disease cases.
DDL outperforms adaptation baselines and supervised fine-tuning.
DDL enhances calibration between reliability scores and localization accuracy.
Abstract
Clinical abnormality grounding for rare diseases is often hindered by data scarcity, making supervised fine-tuning impractical and single-pass inference highly unstable. We propose Dynamic Decision Learning (DDL), a framework that enables frozen large vision-language models (LVLMs) to refine their decisions across both language and visual spaces by optimizing instructions and consolidating predictions under visual perturbations. This process improves localization quality and produces a consensus-based reliability score that quantifies model confidence. Results on brain imaging benchmarks, including a rare-disease dataset with 281 pathology types across models ranging from 3B to 72B parameters, show that DDL improves mAP@75 by up to 105% on rare-disease cases and outperforms adaptation baselines and supervised fine-tuning. Furthermore, DDL demonstrates stronger calibration between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
