MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing   Modalities and Intrinsic Noise Conditions

Lin Fan; Yafei Ou; Cenyang Zheng; Pengyu Dai; Tamotsu Kamishima,; Masayuki Ikebe; Kenji Suzuki; Xun Gong

arXiv:2406.10569·cs.LG·November 19, 2024

MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions

Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima,, Masayuki Ikebe, Kenji Suzuki, Xun Gong

PDF

Open Access

TL;DR

This paper presents MDA, a multi-modal fusion model that adaptively handles missing data and noise, improving interpretability and scalability in medical diagnostics with state-of-the-art performance.

Contribution

The paper introduces the MDA model, which constructs linear relationships between modalities using continuous attention, effectively addressing heterogeneity, missing data, noise, and interpretability challenges.

Findings

01

MDA maintains state-of-the-art performance across multiple datasets.

02

MDA aligns with clinical diagnostic standards.

03

MDA effectively reduces attention to low-correlation or noisy modalities.

Abstract

Multi-modal learning has shown exceptional performance in various tasks, especially in medical applications, where it integrates diverse medical information for comprehensive diagnostic evidence. However, there still are several challenges in multi-modal learning, 1. Heterogeneity between modalities, 2. uncertainty in missing modalities, 3. influence of intrinsic noise, and 4. interpretability for fusion result. This paper introduces the Modal-Domain Attention (MDA) model to address the above challenges. MDA constructs linear relationships between modalities through continuous attention, due to its ability to adaptively allocate dynamic attention to different modalities, MDA can reduce attention to low-correlation data, missing modalities, or modalities with inherent noise, thereby maintaining SOTA performance across various tasks on multiple public datasets. Furthermore, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Time Series Analysis and Forecasting · Neural Networks and Applications