What Makes Multi-modal Learning Better than Single (Provably)

Yu Huang; Chenzhuang Du; Zihui Xue; Xuanyao Chen; Hang Zhao; Longbo; Huang

arXiv:2106.04538·cs.LG·October 27, 2021·43 cites

What Makes Multi-modal Learning Better than Single (Provably)

Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo, Huang

PDF

Open Access 1 Video

TL;DR

This paper provides the first theoretical proof that multi-modal learning outperforms single-modal learning in terms of population risk, supported by experiments, under a common fusion framework.

Contribution

It offers a novel theoretical analysis demonstrating that multi-modal learning has a smaller population risk than uni-modal learning, explaining observed empirical advantages.

Findings

01

Multi-modal learning achieves lower population risk than uni-modal.

02

Theoretical justification for multi-modal superiority is established.

03

Experimental results support the theoretical claims.

Abstract

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multi-modal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multi-modal learning provably perform better than uni-modal? In this paper, we answer this question under a most popular multi-modal fusion framework, which firstly encodes features from different modalities into a common latent space and seamlessly maps the latent representations into the task space. We prove that learning with multiple modalities achieves a smaller population risk than only using its subset of modalities. The main intuition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What Makes Multi-Modal Learning Better than Single (Provably)· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning