On the Comparison between Multi-modal and Single-modal Contrastive Learning
Wei Huang, Andi Han, Yongqiang Chen, Yuan Cao, Zhiqiang Xu, Taiji, Suzuki

TL;DR
This paper develops a theoretical framework to compare multi-modal and single-modal contrastive learning, revealing the critical role of signal-to-noise ratio in their generalization performance, supported by empirical validation.
Contribution
It introduces a unified feature learning theory for contrastive learning, analyzing the impact of signal-to-noise ratio and modality cooperation on downstream task performance.
Findings
Multi-modal contrastive learning outperforms single-modal in downstream tasks.
Signal-to-noise ratio critically influences generalization.
Theoretical analysis aligns with empirical results on synthetic and real datasets.
Abstract
Multi-modal contrastive learning with language supervision has presented a paradigm shift in modern machine learning. By pre-training on a web-scale dataset, multi-modal contrastive learning can learn high-quality representations that exhibit impressive robustness and transferability. Despite its empirical success, the theoretical understanding is still in its infancy, especially regarding its comparison with single-modal contrastive learning. In this work, we introduce a feature learning theory framework that provides a theoretical foundation for understanding the differences between multi-modal and single-modal contrastive learning. Based on a data generation model consisting of signal and noise, our analysis is performed on a ReLU network trained with the InfoMax objective function. Through a trajectory-based optimization analysis and generalization characterization on downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Contrastive Learning
