Measuring and Aligning Abstraction in Vision-Language Models with Medical Taxonomies

Ben Schaper; Maxime Di Folco; Bernhard Kainz; Julia A. Schnabel; Cosmin I. Bercea

arXiv:2601.14827·cs.AI·January 22, 2026

Measuring and Aligning Abstraction in Vision-Language Models with Medical Taxonomies

Ben Schaper, Maxime Di Folco, Bernhard Kainz, Julia A. Schnabel, Cosmin I. Bercea

PDF

Open Access

TL;DR

This paper evaluates vision-language models in medical imaging, revealing misalignments with clinical taxonomies and proposing hierarchical metrics and fine-tuning methods to improve clinical safety and interpretability.

Contribution

It introduces hierarchical evaluation metrics and taxonomy-aware fine-tuning techniques to better align VLMs with medical taxonomies, reducing critical abstraction errors.

Findings

01

High flat performance but substantial taxonomy misalignment

02

Proposed methods reduce severe errors to below 2%

03

Hierarchical metrics improve clinical relevance of model evaluation

Abstract

Vision-Language Models show strong zero-shot performance for chest X-ray classification, but standard flat metrics fail to distinguish between clinically minor and severe errors. This work investigates how to quantify and mitigate abstraction errors by leveraging medical taxonomies. We benchmark several state-of-the-art VLMs using hierarchical metrics and introduce Catastrophic Abstraction Errors to capture cross-branch mistakes. Our results reveal substantial misalignment of VLMs with clinical taxonomies despite high flat performance. To address this, we propose risk-constrained thresholding and taxonomy-aware fine-tuning with radial embeddings, which reduce severe abstraction errors to below 2 per cent while maintaining competitive performance. These findings highlight the importance of hierarchical evaluation and representation-level alignment for safer and more clinically meaningful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Machine Learning in Healthcare