Provable Dynamic Fusion for Low-Quality Multimodal Data

Qingyang Zhang; Haitao Wu; Changqing Zhang; Qinghua Hu; Huazhu Fu,; Joey Tianyi Zhou; Xi Peng

arXiv:2306.02050·cs.LG·June 7, 2023·20 cites

Provable Dynamic Fusion for Low-Quality Multimodal Data

Qingyang Zhang, Haitao Wu, Changqing Zhang, Qinghua Hu, Huazhu Fu,, Joey Tianyi Zhou, Xi Peng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a theoretically justified, robust multimodal fusion framework called QMF that enhances classification accuracy and robustness, especially with low-quality data, by leveraging uncertainty estimation.

Contribution

It provides the first theoretical analysis of dynamic multimodal fusion and proposes a novel, provably robust fusion method based on uncertainty estimation.

Findings

01

QMF improves classification accuracy

02

QMF enhances model robustness

03

Experimental results validate theoretical insights

Abstract

The inherent challenge of multimodal fusion is to precisely capture the cross-modal correlation and flexibly conduct cross-modal interaction. To fully release the value of each modality and mitigate the influence of low-quality multimodal data, dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. Can we design a provably robust multimodal fusion method? This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. We proceed to reveal that several uncertainty estimation solutions are naturally available to achieve robust multimodal fusion. Then a novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qingyangzhang/qmf
pytorchOfficial

Videos

Provable Dynamic Fusion for Low-Quality Multimodal Data· slideslive

Taxonomy

TopicsRemote-Sensing Image Classification