DiMPLe -- Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation

Umaima Rahman; Mohammad Yaqub; Dwarikanath Mahapatra

arXiv:2506.21237·cs.CV·June 27, 2025

DiMPLe -- Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation

Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra

PDF

Open Access

TL;DR

DiMPLe is a novel multi-modal prompt learning approach that disentangles invariant and spurious features across vision and language modalities to improve out-of-distribution generalization and robustness.

Contribution

It introduces a disentanglement framework for multi-modal features, combining mutual information minimization, regularization, and contrastive learning for better OOD performance.

Findings

01

Outperforms CoOp-OOD across 11 datasets.

02

Achieves 15.27% higher base class accuracy.

03

Achieves 44.31% higher novel class accuracy.

Abstract

We introduce DiMPLe (Disentangled Multi-Modal Prompt Learning), a novel approach to disentangle invariant and spurious features across vision and language modalities in multi-modal learning. Spurious correlations in visual data often hinder out-of-distribution (OOD) performance. Unlike prior methods focusing solely on image features, DiMPLe disentangles features within and across modalities while maintaining consistent alignment, enabling better generalization to novel classes and robustness to distribution shifts. Our method combines three key objectives: (1) mutual information minimization between invariant and spurious features, (2) spurious feature regularization, and (3) contrastive learning on invariant features. Extensive experiments demonstrate DiMPLe demonstrates superior performance compared to CoOp-OOD, when averaged across 11 diverse datasets, and achieves absolute gains of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsContrastive Learning · Balanced Selection