A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding
Christina Liu, Alan Q. Wang, Joy Hsu, Jiajun Wu, Ehsan Adeli

TL;DR
The paper introduces the Tool Bottleneck Framework (TBF), a novel approach for medical image understanding that combines vision-language models with a learned tool composition method, improving interpretability and performance especially with limited data.
Contribution
It proposes a new framework that uses a learned model to compose tools selected by vision-language models, enhancing interpretability and effectiveness in medical imaging tasks.
Findings
TBF performs on par or better than existing methods in histopathology and dermatology tasks.
The framework shows particular advantages in data-limited scenarios.
TBF improves interpretability of medical image predictions.
Abstract
Recent tool-use frameworks powered by vision-language models (VLMs) improve image understanding by grounding model predictions with specialized tools. Broadly, these frameworks leverage VLMs and a pre-specified toolbox to decompose the prediction task into multiple tool calls (often deep learning models) which are composed to make a prediction. The dominant approach to composing tools is using text, via function calls embedded in VLM-generated code or natural language. However, these methods often perform poorly on medical image understanding, where salient information is encoded as spatially-localized features that are difficult to compose or fuse via text alone. To address this, we propose a tool-use framework for medical image understanding called the Tool Bottleneck Framework (TBF), which composes VLM-selected tools using a learned Tool Bottleneck Model (TBM). For a given image and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
