A Multimodal Data Processing Pipeline for MIMIC-IV Dataset

Farzana Islam Adiba; Varsha Danduri; Fahmida Liza Piya; Ali Abbasi; Mehak Gupta; Rahmatollah Beheshti

arXiv:2601.11606·cs.LG·January 21, 2026

A Multimodal Data Processing Pipeline for MIMIC-IV Dataset

Farzana Islam Adiba, Varsha Danduri, Fahmida Liza Piya, Ali Abbasi, Mehak Gupta, Rahmatollah Beheshti

PDF

Open Access

TL;DR

This paper introduces a comprehensive, customizable multimodal data processing pipeline for the MIMIC-IV dataset, streamlining integration of diverse data types to facilitate clinical machine learning research.

Contribution

It presents a new pipeline that automates and standardizes multimodal data integration from MIMIC-IV, improving efficiency and reproducibility over existing methods.

Findings

01

Reduces multimodal processing time significantly.

02

Supports arbitrary downstream applications.

03

Enhances reproducibility of MIMIC-based studies.

Abstract

The MIMIC-IV dataset is a large, publicly available electronic health record (EHR) resource widely used for clinical machine learning research. It comprises multiple modalities, including structured data, clinical notes, waveforms, and imaging data. Working with these disjointed modalities requires an extensive manual effort to preprocess and align them for downstream analysis. While several pipelines for MIMIC-IV data extraction are available, they target a small subset of modalities or do not fully support arbitrary downstream applications. In this work, we greatly expand our prior popular unimodal pipeline and present a comprehensive and customizable multimodal pipeline that can significantly reduce multimodal processing time and enhance the reproducibility of MIMIC-based studies. Our pipeline systematically integrates the listed modalities, enabling automated cohort selection,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Healthcare Technology and Patient Monitoring · Electronic Health Records Systems