Is Clinical Text Enough? A Multimodal Study on Mortality Prediction in Heart Failure Patients
Oumaima El Khettari, Virgile Barthet, Guillaume Hocquet, Joconde Weller, Emmanuel Morin, Pierre Zweigenbaum

TL;DR
This study compares various transformer-based models for short-term mortality prediction in heart failure patients, showing that entity-aware multimodal transformers outperform LLMs and text-only models.
Contribution
It demonstrates that integrating clinical text with structured data using entity-level representations enhances prediction accuracy in heart failure outcomes.
Findings
Entity-aware multimodal transformers outperform other models.
Text-only prompts often outperform structured or multimodal inputs.
LLMs show inconsistent performance across modalities.
Abstract
Accurate short-term mortality prediction in heart failure (HF) remains challenging, particularly when relying on structured electronic health record (EHR) data alone. We evaluate transformer-based models on a French HF cohort, comparing text-only, structured-only, multimodal, and LLM-based approaches. Our results show that enriching clinical text with entity-level representations improves prediction over CLS embeddings alone, and that supervised multimodal fusion of text and structured variables achieves the best overall performance. In contrast, large language models perform inconsistently across modalities and decoding strategies, with text-only prompts outperforming structured or multimodal inputs. These findings highlight that entity-aware multimodal transformers offer the most reliable solution for short-term HF outcome prediction, while current LLM prompting remains limited for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
