Promoting cross-modal representations to improve multimodal foundation   models for physiological signals

Ching Fang; Christopher Sandino; Behrooz Mahasseni; Juri Minxha; Hadi; Pouransari; Erdrin Azemi; Ali Moin; Ellen Zippi

arXiv:2410.16424·cs.LG·October 23, 2024

Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Ching Fang, Christopher Sandino, Behrooz Mahasseni, Juri Minxha, Hadi, Pouransari, Erdrin Azemi, Ali Moin, Ellen Zippi

PDF

Open Access

TL;DR

This paper explores pretraining strategies for multimodal healthcare models using physiological signals, emphasizing cross-modal reconstruction and modality dropout to improve downstream task performance and representation quality.

Contribution

It introduces a masked autoencoding pretraining approach with modality dropout and analyzes the impact of cross-modal objectives on model representations in healthcare data.

Findings

01

Cross-modal reconstruction improves downstream task performance.

02

Modality dropout enhances model robustness and generalization.

03

Pretraining leads to more cross-modal and temporally aligned attention weights.

Abstract

Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration and it is unclear which pretraining strategies are most effective given the diversity of physiological signals. This is partly due to challenges in multimodal health data: obtaining data across many patients is difficult and costly, there is a lot of inter-subject variability, and modalities are often heterogeneously informative across downstream tasks. Here, we explore these challenges in the PhysioNet 2018 dataset. We use a masked autoencoding objective to pretrain a multimodal model. We show that the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Context-Aware Activity Recognition Systems · ECG Monitoring and Analysis

MethodsSoftmax · Attention Is All You Need · Contrastive Learning · Sparse Evolutionary Training · Dropout