HyperMM : Robust Multimodal Learning with Varying-sized Inputs

Hava Chaptoukaev; Vincenzo Marcian\'o; Francesco Galati; Maria A.; Zuluaga

arXiv:2407.20768·cs.LG·July 31, 2024

HyperMM : Robust Multimodal Learning with Varying-sized Inputs

Hava Chaptoukaev, Vincenzo Marcian\'o, Francesco Galati, Maria A., Zuluaga

PDF

TL;DR

HyperMM is a novel end-to-end framework for robust multimodal learning that effectively handles missing modalities and varying input sizes without relying on complex imputation strategies, demonstrated on medical diagnosis tasks.

Contribution

The paper introduces HyperMM, a universal, permutation-invariant neural network with a conditional hypernetwork for training, enabling robust multimodal learning with incomplete and varying-sized inputs.

Findings

01

HyperMM outperforms existing methods in handling missing data in medical diagnosis.

02

The framework maintains high accuracy even with high rates of missing modalities.

03

HyperMM generalizes well to datasets with varying input sizes beyond missing modality scenarios.

Abstract

Combining multiple modalities carrying complementary information through multimodal learning (MML) has shown considerable benefits for diagnosing multiple pathologies. However, the robustness of multimodal models to missing modalities is often overlooked. Most works assume modality completeness in the input data, while in clinical practice, it is common to have incomplete modalities. Existing solutions that address this issue rely on modality imputation strategies before using supervised learning models. These strategies, however, are complex, computationally costly and can strongly impact subsequent prediction models. Hence, they should be used with parsimony in sensitive applications such as healthcare. We propose HyperMM, an end-to-end framework designed for learning with varying-sized inputs. Specifically, we focus on the task of supervised MML with missing imaging modalities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus