Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers
Malte T\"olle, Mohamad Scharaf, Samantha Fischer, Christoph Reich,, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild,, Sandy Engelhardt

TL;DR
This paper introduces ViTiMM, a vision transformer-based approach that visualizes diverse patient data modalities as images and text, simplifying multi-modal modeling and improving in-hospital mortality and phenotyping predictions.
Contribution
The paper presents a novel method that unifies multi-modal medical data as images and text for transformer-based modeling, reducing complexity and outperforming existing methods.
Findings
Outperforms state-of-the-art in-hospital mortality prediction
Effective across multiple modalities including images and signals
Simplifies multi-modal data integration for medical AI
Abstract
A patient undergoes multiple examinations in each hospital stay, where each provides different facets of the health status. These assessments include temporal data with varying sampling rates, discrete single-point measurements, therapeutic interventions such as medication administration, and images. While physicians are able to process and integrate diverse modalities intuitively, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Medical Image Segmentation Techniques
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Vision Transformer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Softmax · Adam
