Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection

Runhe Lai; Xinhua Lu; Kanghao Chen; Qichao Chen; Wei-Shi Zheng; Ruixuan Wang

arXiv:2508.17667·cs.CV·August 26, 2025

Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection

Runhe Lai, Xinhua Lu, Kanghao Chen, Qichao Chen, Wei-Shi Zheng, Ruixuan Wang

PDF

TL;DR

This paper introduces a hierarchical vision-language framework for medical out-of-distribution detection, enhancing the identification of unknown diseases by integrating multi-scale visual features and generating hard pseudo-OOD samples.

Contribution

It proposes a novel cross-scale visual fusion and pseudo-OOD sample generation strategy to improve medical OOD detection using vision-language models.

Findings

01

Outperforms existing methods on three public datasets

02

Enriches medical image representations with multi-scale features

03

Effective in detecting challenging unknown diseases

Abstract

In trustworthy medical diagnosis systems, integrating out-of-distribution (OOD) detection aims to identify unknown diseases in samples, thereby mitigating the risk of misdiagnosis. In this study, we propose a novel OOD detection framework based on vision-language models (VLMs), which integrates hierarchical visual information to cope with challenging unknown diseases that resemble known diseases. Specifically, a cross-scale visual fusion strategy is proposed to couple visual embeddings from multiple scales. This enriches the detailed representation of medical images and thus improves the discrimination of unknown diseases. Moreover, a cross-scale hard pseudo-OOD sample generation strategy is proposed to benefit OOD detection maximally. Experimental evaluations on three public medical datasets support that the proposed framework achieves superior OOD detection performance compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.