Orthogonalized Multimodal Contrastive Learning with Asymmetric Masking for Structured Representations

Carolin Cissee; Raneen Younis; Zahra Ahmadi

arXiv:2602.14983·cs.LG·February 17, 2026

Orthogonalized Multimodal Contrastive Learning with Asymmetric Masking for Structured Representations

Carolin Cissee, Raneen Younis, Zahra Ahmadi

PDF

Open Access

TL;DR

COrAL is a novel multimodal contrastive learning framework that explicitly disentangles shared, unique, and synergistic information using orthogonality constraints and asymmetric masking, leading to more stable and comprehensive representations.

Contribution

The paper introduces COrAL, a framework that explicitly models all information components in multimodal data, improving representation quality and stability over existing methods.

Findings

01

Outperforms state-of-the-art on synthetic and real datasets

02

Achieves lower variance in performance across runs

03

Produces more stable and comprehensive multimodal embeddings

Abstract

Multimodal learning seeks to integrate information from heterogeneous sources, where signals may be shared across modalities, specific to individual modalities, or emerge only through their interaction. While self-supervised multimodal contrastive learning has achieved remarkable progress, most existing methods predominantly capture redundant cross-modal signals, often neglecting modality-specific (unique) and interaction-driven (synergistic) information. Recent extensions broaden this perspective, yet they either fail to explicitly model synergistic interactions or learn different information components in an entangled manner, leading to incomplete representations and potential information leakage. We introduce \textbf{COrAL}, a principled framework that explicitly and simultaneously preserves redundant, unique, and synergistic information within multimodal representations. COrAL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications