Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation

Yimu Wang; Evelien Riddell; Adrian Chow; Sean Sedwards; Krzysztof Czarnecki

arXiv:2502.00662·cs.CV·January 27, 2026

Mitigating the Modality Gap: Few-Shot Out-of-Distribution Detection with Multi-modal Prototypes and Image Bias Estimation

Yimu Wang, Evelien Riddell, Adrian Chow, Sean Sedwards, Krzysztof Czarnecki

PDF

Open Access

TL;DR

This paper introduces a novel framework called SUPREME that improves vision-language model-based out-of-distribution detection by incorporating image prototypes, bias estimation, and a new OOD scoring method, significantly reducing false positives.

Contribution

The paper proposes a new few-shot tuning framework, SUPREME, that enhances OOD detection by reducing the modality gap through biased prompts and image-text consistency, without additional training.

Findings

01

SUPREME outperforms existing VLM-based OOD detection methods.

02

Incorporating image prototypes reduces false positives.

03

The new OOD score $S_{GMP}$ improves detection accuracy.

Abstract

Existing vision-language model (VLM)-based methods for out-of-distribution (OOD) detection typically rely on similarity scores between input images and in-distribution (ID) text prototypes. However, the modality gap between image and text often results in high false positive rates, as OOD samples can exhibit high similarity to ID text prototypes. To mitigate the impact of this modality gap, we propose incorporating ID image prototypes along with ID text prototypes. We present theoretical analysis and empirical evidence indicating that this approach enhances VLM-based OOD detection performance without any additional training. To further reduce the gap between image and text, we introduce a novel few-shot tuning framework, SUPREME, comprising biased prompts generation (BPG) and image-text consistency (ITC) modules. BPG enhances image-text fusion and improves generalization by conditioning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Digital Media Forensic Detection