TL;DR
WALDO introduces a zero-shot, reference-based localisation framework for medical imaging that leverages optimal transport and structured comparison to improve anomaly detection accuracy.
Contribution
It presents a training-free, optimal transport-based method that enhances zero-shot localisation by structured reference comparison and introduces the Goldilocks zone sampling strategy.
Findings
Achieves 43.5% mAP@30 on NOVA brain MRI benchmark, a 19% improvement over baselines.
Demonstrates consistent gains across different vision-language models.
Statistically significant improvements confirmed by McNemar tests.
Abstract
Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables comparative reasoning through: (i) entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection from DINOv2 patch distributions, (ii) Goldilocks zone sampling exploiting the non-monotonic relationship between reference similarity and localisation accuracy, and (iii) self-consistency aggregation via weighted non-maximum suppression. We theoretically analyse the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
