A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Christina Liu; Alan Q. Wang; Joy Hsu; Jiajun Wu; Ehsan Adeli

arXiv:2512.21414·cs.CV·December 29, 2025

A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Christina Liu, Alan Q. Wang, Joy Hsu, Jiajun Wu, Ehsan Adeli

PDF

Open Access

TL;DR

The paper introduces the Tool Bottleneck Framework (TBF), a novel approach for medical image understanding that combines vision-language models with a learned tool composition method, improving interpretability and performance especially with limited data.

Contribution

It proposes a new framework that uses a learned model to compose tools selected by vision-language models, enhancing interpretability and effectiveness in medical imaging tasks.

Findings

01

TBF performs on par or better than existing methods in histopathology and dermatology tasks.

02

The framework shows particular advantages in data-limited scenarios.

03

TBF improves interpretability of medical image predictions.

Abstract

Recent tool-use frameworks powered by vision-language models (VLMs) improve image understanding by grounding model predictions with specialized tools. Broadly, these frameworks leverage VLMs and a pre-specified toolbox to decompose the prediction task into multiple tool calls (often deep learning models) which are composed to make a prediction. The dominant approach to composing tools is using text, via function calls embedded in VLM-generated code or natural language. However, these methods often perform poorly on medical image understanding, where salient information is encoded as spatially-localized features that are difficult to compose or fuse via text alone. To address this, we propose a tool-use framework for medical image understanding called the Tool Bottleneck Framework (TBF), which composes VLM-selected tools using a learned Tool Bottleneck Model (TBM). For a given image and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)