Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging

Bernhard Kainz; Johanna P Mueller; Matthew Baugh; Cosmin Bercea

arXiv:2605.05161·cs.CV·May 7, 2026

Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging

Bernhard Kainz, Johanna P Mueller, Matthew Baugh, Cosmin Bercea

PDF

1 Repo

TL;DR

WALDO introduces a zero-shot, reference-based localisation framework for medical imaging that leverages optimal transport and structured comparison to improve anomaly detection accuracy.

Contribution

It presents a training-free, optimal transport-based method that enhances zero-shot localisation by structured reference comparison and introduces the Goldilocks zone sampling strategy.

Findings

01

Achieves 43.5% mAP@30 on NOVA brain MRI benchmark, a 19% improvement over baselines.

02

Demonstrates consistent gains across different vision-language models.

03

Statistically significant improvements confirmed by McNemar tests.

Abstract

Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables comparative reasoning through: (i) entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection from DINOv2 patch distributions, (ii) Goldilocks zone sampling exploiting the non-monotonic relationship between reference similarity and localisation accuracy, and (iii) self-consistency aggregation via weighted non-maximum suppression. We theoretically analyse the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bkainz/WALDO_MICCAI26_demo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.