Towards the Causal Complete Cause of Multi-Modal Representation Learning
Jingyao Wang, Siyu Zhao, Wenwen Qiang, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong

TL;DR
This paper introduces a causal framework for multi-modal learning representations, emphasizing their sufficiency and necessity, and proposes a regularization method to enforce these properties, improving representation quality.
Contribution
It defines the Causal Complete Cause ($C^3$), explores its identifiability, and develops a regularization method to enforce causal completeness in multi-modal representations.
Findings
The $C^3$ regularization improves representation causality.
The twin network effectively estimates $C^3$ risk.
The method demonstrates superior performance in experiments.
Abstract
Multi-Modal Learning (MML) aims to learn effective representations across modalities for accurate predictions. Existing methods typically focus on modality consistency and specificity to learn effective representations. However, from a causal perspective, they may lead to representations that contain insufficient and unnecessary information. To address this, we propose that effective MML representations should be causally sufficient and necessary. Considering practical issues like spurious correlations and modality conflicts, we relax the exogeneity and monotonicity assumptions prevalent in prior works and explore the concepts specific to MML, i.e., Causal Complete Cause . We begin by defining , which quantifies the probability of representations being causally sufficient and necessary. We then discuss the identifiability of and introduce an instrumental variable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
