Arbitrary Data as Images: Fusion of Patient Data Across Modalities and   Irregular Intervals with Vision Transformers

Malte T\"olle; Mohamad Scharaf; Samantha Fischer; Christoph Reich,; Silav Zeid; Christoph Dieterich; Benjamin Meder; Norbert Frey; Philipp Wild,; Sandy Engelhardt

arXiv:2501.18237·cs.CV·January 31, 2025

Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Malte T\"olle, Mohamad Scharaf, Samantha Fischer, Christoph Reich,, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild,, Sandy Engelhardt

PDF

Open Access

TL;DR

This paper introduces ViTiMM, a vision transformer-based approach that visualizes diverse patient data modalities as images and text, simplifying multi-modal modeling and improving in-hospital mortality and phenotyping predictions.

Contribution

The paper presents a novel method that unifies multi-modal medical data as images and text for transformer-based modeling, reducing complexity and outperforming existing methods.

Findings

01

Outperforms state-of-the-art in-hospital mortality prediction

02

Effective across multiple modalities including images and signals

03

Simplifies multi-modal data integration for medical AI

Abstract

A patient undergoes multiple examinations in each hospital stay, where each provides different facets of the health status. These assessments include temporal data with varying sampling rates, discrete single-point measurements, therapeutic interventions such as medication administration, and images. While physicians are able to process and integrate diverse modalities intuitively, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Medical Image Segmentation Techniques

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Vision Transformer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Softmax · Adam