Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model
Jintao Zhang, Zirui Liu, Mingyue Cheng, Shilong Zhang, Tingyue Pan, Yitong zhou, Qi Liu, Yanhu Xie

TL;DR
This paper introduces IOHFuseLM, a multimodal language model that improves intraoperative hypotension prediction by integrating physiological data and clinical descriptions through a two-stage training process and token-level alignment.
Contribution
The paper presents a novel multimodal framework with domain adaptive pretraining and token-level alignment for better IOH event prediction.
Findings
Outperforms baseline models in IOH event detection
Effective integration of clinical descriptions and physiological data
Demonstrates robustness across two intraoperative datasets
Abstract
Intraoperative hypotension (IOH) frequently occurs under general anesthesia and is strongly linked to adverse outcomes such as myocardial injury and increased mortality. Despite its significance, IOH prediction is hindered by event sparsity and the challenge of integrating static and dynamic data across diverse patients. In this paper, we propose \textbf{IOHFuseLM}, a multimodal language model framework. To accurately identify and differentiate sparse hypotensive events, we leverage a two-stage training strategy. The first stage involves domain adaptive pretraining on IOH physiological time series augmented through diffusion methods, thereby enhancing the model sensitivity to patterns associated with hypotension. Subsequently, task fine-tuning is performed on the original clinical dataset to further enhance the ability to distinguish normotensive from hypotensive states. To enable…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* The problem this work tries to address has high significance, and its solution is beneficial to its specific domain, i.e., providing insights into safe anesthesia. * The proposed framework achieves the best performance among other benchmark models studied in this work, and this state-of-the-art performance is validated on two datasets. * Visualization in Appendix F gives more insights into the model behaviors in the first pretraining stage.
The weaknesses are mainly in two aspects: * Methodological insufficiency/unclearness: * When preparing the text input, the additional GPT generation lacks justification. If the input text is just the template of "The age of patient is {age of patient}, gender is {gender of patient}, and the type of surgery is {surgery type of patient}", will the pretraining stage become easier or harder? A relevant issue is that although the appendix provides the prompt, there is no quality check on the ou
* The focus on a real clinical problem (IOH) is interesting. I do appreciate the task-first rather than method-first approach. * Experiments in Table 1 compare to sufficient alternative methods that cover both other general purpose approaches and recent methods specific to IOH * The attention to ablations in Sec. 5 is welcome and seems reasonably thorough * The approach seems sufficiently original, in that I haven't seen token-level "alignment" of vital sign embeddings and descriptions of static
Overall, this is an interesting approach but I worry about the soundness, clarity and reproducibility of the methods description (see concerns M1-M5 below) as well as the thoroughness and reproducibility of the experimental design (see concerns E1-E8). ## Issues with modeling approach ### M1: Value of the prediction task in real clinical workflow remains unclear The model is intended to predict intraoperative hypotension (IOH) events, when > mean arterial pressure (MAP) remains below 65 m
1.Important and Well-Defined Problem: The paper targets IOH prediction, a significant clinical challenge . It accurately identifies the core bottlenecks of existing methods: event sparsity and heterogeneous data fusion. 2.High Novelty (PCDG): The PCDG module is an outstanding innovation. It creatively uses a powerful LLM (GPT-4o) as an advanced feature encoder to "textify" static attributes. This allows them to be elegantly fused with time-series patch embeddings via token-level cross-attention
1.Dependency on GPT-4o: The framework's first critical step, PCDG , relies on a closed-source, expensive, and potentially non-stationary LLM (GPT-4o). This significantly harms reproducibility, increases deployment costs, and introduces an external dependency. Suggestion: The authors should add an ablation using a smaller, open-source model (e.g., LLaMA-3-8B or a medical-domain-specific model) to perform PCDG and report the performance difference. 2.Unvalidated Assumption in MTRDA: The MTRDA modu
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCardiac, Anesthesia and Surgical Outcomes · Hemodynamic Monitoring and Therapy
