Measuring and Aligning Abstraction in Vision-Language Models with Medical Taxonomies
Ben Schaper, Maxime Di Folco, Bernhard Kainz, Julia A. Schnabel, Cosmin I. Bercea

TL;DR
This paper evaluates vision-language models in medical imaging, revealing misalignments with clinical taxonomies and proposing hierarchical metrics and fine-tuning methods to improve clinical safety and interpretability.
Contribution
It introduces hierarchical evaluation metrics and taxonomy-aware fine-tuning techniques to better align VLMs with medical taxonomies, reducing critical abstraction errors.
Findings
High flat performance but substantial taxonomy misalignment
Proposed methods reduce severe errors to below 2%
Hierarchical metrics improve clinical relevance of model evaluation
Abstract
Vision-Language Models show strong zero-shot performance for chest X-ray classification, but standard flat metrics fail to distinguish between clinically minor and severe errors. This work investigates how to quantify and mitigate abstraction errors by leveraging medical taxonomies. We benchmark several state-of-the-art VLMs using hierarchical metrics and introduce Catastrophic Abstraction Errors to capture cross-branch mistakes. Our results reveal substantial misalignment of VLMs with clinical taxonomies despite high flat performance. To address this, we propose risk-constrained thresholding and taxonomy-aware fine-tuning with radial embeddings, which reduce severe abstraction errors to below 2 per cent while maintaining competitive performance. These findings highlight the importance of hierarchical evaluation and representation-level alignment for safer and more clinically meaningful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Machine Learning in Healthcare
