DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection

Jiangbei Yue; Darren Treanor; Venkataraman Subramanian; Sharib Ali

arXiv:2604.08261·cs.CV·April 16, 2026

DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection

Jiangbei Yue, Darren Treanor, Venkataraman Subramanian, Sharib Ali

PDF

TL;DR

This paper introduces a dual-branch multimodal framework combining text-image and vision modalities to improve out-of-distribution detection in clinical deep learning applications.

Contribution

It presents a novel multimodal approach that leverages both text-image and vision information for more effective OOD detection, outperforming existing methods.

Findings

01

Achieves up to 24.84% improvement over state-of-the-art in OOD detection.

02

Demonstrates robustness across various backbone architectures.

03

Validates effectiveness on publicly available endoscopic image datasets.

Abstract

The complex and dynamic real-world clinical environment demands reliable deep learning (DL) systems. Out-of-distribution (OOD) detection plays a critical role in enhancing the reliability and generalizability of DL models when encountering data that deviate from the training distribution, such as unseen disease cases. However, existing OOD detection methods typically rely either on a single visual modality or solely on image-text matching, failing to fully leverage multimodal information. To overcome the challenge, we propose a novel dual-branch multimodal framework by introducing a text-image branch and a vision branch. Our framework fully exploits multimodal representations to identify OOD samples through these two complementary branches. After training, we compute scores from the text-image branch ( $S_{t}$ ) and vision branch ( $S_{v}$ ), and integrate them to obtain the final OOD score $S$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.