Research Topics and Trends in MIMIC‐IV: A Large ICU Database Relevant for Critical Care Nursing
Yuh‐Shan Ho, Ahmed Ben Salem, Mahdi Kchaou, Abdulhameed Dere, Yosra Mzid, Houcemeddine Turki

TL;DR
This paper maps research themes in the MIMIC-IV ICU database from 2021–2024, highlighting topics relevant to critical care nursing and data-driven decision-making.
Contribution
This is the first scoping review of MIMIC-IV research themes, emphasizing their relevance to nursing practice and clinical workflows.
Findings
Dominant research areas include mortality prediction, sepsis, and acute kidney injury.
Themes are directly relevant to nursing-sensitive outcomes and bedside decision-making.
The study offers a structured evidence base for guiding future data-driven nursing research.
Abstract
The Medical Information Mart for Intensive Care‐IV (MIMIC‐IV) clinical database has become a central resource for data‐driven critical care research, enabling advances in clinical informatics, machine learning and nursing science. Despite its rapid uptake, no prior study has provided a transparent, methodologically grounded, bibliometrics‐based overview of MIMIC‐IV‐related research output. This paper aims to map the major research themes associated with the MIMIC‐IV database (2021–2024) and to evaluate their relevance to critical care nursing research and practice. A study of 1150 publications retrieved from the Web of Science Core Collection (SCI‐Expanded). Explicit search strategies, front‐page filtering and publication counts were used to identify and analyse keyword‐based research themes. Keyword analyses identified mortality prediction, sepsis, acute kidney injury, intensive…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3| Words in the title | TP | Rank (%) | Author keywords | TP | Rank (%) |
|---|---|---|---|---|---|
| Patients | 865 | 1 (75) | Mortality | 279 | 1 (27) |
| Mortality | 534 | 2 (46) | Sepsis | 223 | 2 (22) |
| Retrospective | 349 | 3 (30) | Intensive care unit | 169 | 3 (16) |
| Acute | 299 | 4 (26) | Machine learning | 123 | 4 (12) |
| Association | 283 | 5 (25) | Acute kidney injury | 120 | 5 (12) |
| Database | 276 | 6 (24) | Prognosis | 105 | 6 (8.3) |
| Ill | 227 | 7 (20) | In‐hospital mortality | 86 | 7 (8.0) |
| Critically | 225 | 8 (20) | Nomogram | 83 | 8 (6.8) |
| Cohort | 223 | 9 (19) | Critical care | 70 | 9 (4.8) |
| Sepsis | 182 | 10 (16) | All‐cause mortality | 50 | 10 (4.7) |
| Injury | 177 | 11 (15) | ICU | 49 | 11 (3.7) |
| Kidney | 167 | 12 (15) | Prediction model | 38 | 12 (3.2) |
| In‐hospital | 163 | 13 (14) | Heart failure | 33 | 13 (3.1) |
| Ratio | 148 | 14 (13) | Acute myocardial infarction | 32 | 14 (3.0) |
| Prediction | 144 | 15 (13) | Atrial fibrillation | 31 | 15 (2.9) |
| Risk | 136 | 16 (12) | Acute pancreatitis | 30 | 16 (2.8) |
| Learning | 134 | 17 (12) | 28‐day mortality | 29 | 17 (2.7) |
| Machine | 128 | 18 (11) | Septic shock | 28 | 17 (2.7) |
| Unit | 126 | 19 (11) | Prediction | 28 | 19 (2.6) |
| ICU | 117 | 20 (10) | Critically ill patients | 27 | 20 (2.5) |
| Major research theme | Key insights from MIMIC‐IV studies | Relevant critical care nursing competencies | Practice implications for nursing |
|---|---|---|---|
| Mortality prediction and risk stratification | Identifies robust biomarkers (e.g., TyG index, SHR, NLR) and models for prognosticating outcomes in sepsis, CVD and stroke. |
| Enables nurses to identify high‐risk patients earlier, tailor monitoring intensity and prioritise care interventions based on data‐driven risk scores. |
| Sepsis and septic shock | Evaluates prognostic factors, optimal timing for interventions (e.g., vasopressin) and efficacy of therapies (e.g., aspirin, anticoagulation). |
| Empowers nurses as key agents in early detection, protocol‐driven management and team‐based titration of therapies to improve sepsis outcomes. |
| Acute kidney injury (AKI) | Models for predicting AKI onset and progression, links to sepsis, pancreatitis, and drug exposures (e.g., contrast, antibiotics). |
| Guides preventive nursing actions, such as vigilant monitoring of at‐risk patients and advocating for nephroprotective strategies. |
| ICU workflows and outcome prediction | Machine learning models predict LOS, mortality and complications; algorithms process high‐volume time‐series data. |
| Helps nurses interpret AI‐generated alerts, participate in model refinement and use predictions to streamline care planning and resource allocation. |
| Cardiovascular disorders | Prognostic insights for AF, HF and MI focussing on metabolic control, inflammation and frailty. |
| Supports nurses in managing cardiac instability and providing targeted education based on individualised risk profiles identified in the ICU. |
| Acute pancreatitis (AP) | Identifies prognostic markers (e.g., lactate‐albumin ratio, TyG index) for mortality; highlights associations with AKI, sepsis and multi‐organ dysfunction. |
| Enables early recognition of patients at risk for severe AP or organ failure, guiding vigilant monitoring, timely fluid resuscitation and coordinated care with nutritional and gastrointestinal specialists. |
| Models and algorithms | Development and validation of interpretable AI tools (e.g., nomograms, ML pipelines) for clinical decision support. |
|
| Disease domain | Indicator/intervention/model | Outcome or prognostic value | Key references |
|---|---|---|---|
| General intensive and critical care | ICU length‐of‐stay models | Predict recovery trajectory and resource utilisation | [ |
| Global ICU mortality models | Support early deterioration recognition | [ | |
| Ketamine exposure | Associated with outcomes in ventilated patients | [ | |
| BMI and delirium | Identify neurologic complication risk | [ | |
| Sepsis and septic shock | Prophylactic heparin | Early anticoagulation reduces sepsis mortality | [ |
| Initial ventilation strategy | Influences in‐hospital mortality | [ | |
| Triglyceride–Glucose (TyG) Index | Independent predictor of in‐hospital mortality, sepsis‐associated AKI and prolonged hospitalisation | [ | |
| Systemic Immune‐Inflammation Index (SII) | High‐fidelity inflammatory marker predicting mortality | [ | |
| Vasopressin initiation timing | Time‐dependent determinant of survival in septic shock | [ | |
| Dynamic vasoactive medication trends | Enables real‐time mortality risk prediction using ML | [ | |
| Aspirin exposure | Associated with improved outcomes in sepsis‐induced myocardial injury and AKI | [ | |
| Acute kidney injury (AKI) | Glycaemic variability | Independent predictor of ICU 30‐day mortality | [ |
| Pulse wave velocity | Reflects haemodynamic stress influencing AKI outcomes | [ | |
| TyG Index | Associated with AKI severity, length of stay and sepsis‐related outcomes | [ | |
| Serum calcium and magnesium | Predict AKI onset in cirrhotic and acute pancreatitis patients | [ | |
| Ondansetron exposure | Associated with reduced AKI mortality | [ | |
| Early AKI prediction models | Predict AKI development within 7 days of ICU admission | [ | |
| Cardiovascular disorders | Stress hyperglycaemia ratio | Predicts all‐cause mortality in critically ill AF patients | [ |
| TyG‐BMI | Strong predictor of 1‐year mortality in AF and heart failure | [ | |
| Serum anion gap | Independent prognostic factor in myocardial infarction | [ | |
| EASIX score | Marker of endothelial dysfunction linked to AF mortality | [ | |
| Influenza vaccination | Modifiable factor reducing AF mortality | [ | |
| Frailty indices | Predict short‐ and long‐term mortality in HF and MI | [ | |
| Machine‐learning mortality models | Improve risk stratification in MI and HF‐AF populations | [ | |
| Acute pancreatitis (AP) | Lactate‐albumin ratio | Superior predictor of 28‐day mortality | [ |
| Bilirubin‐to‐albumin ratio | Associated with short‐ and long‐term mortality | [ | |
| RDW‐to‐albumin ratio | Reflects inflammation–nutrition interaction | [ | |
| TyG Index | Predicts disease severity and sepsis risk | [ | |
| Laboratory‐based frailty index | Quantifies physiologic reserve and mortality risk | [ |
| Dimension/nursing learning outcome | Core MIMIC‐IV insight | Relevance to critical care nursing practice | Educational/conceptual implication | Key references |
|---|---|---|---|---|
| Therapeutic timing and time‐critical intervention awareness | Clinical outcomes depend more on | Reinforces nurse vigilance for time‐sensitive therapies, escalation and protocol adherence | Emphasises narrow therapeutic windows and urgency in nursing decision‐making | [ |
| Dynamic physiologic trends and early deterioration recognition | Longitudinal physiologic trends outperform single measurements for predicting mortality and organ failure | Supports continuous monitoring, trend‐based assessment and early escalation by nurses | Strengthens early warning systems and shift‐level prioritisation skills | [ |
| Machine‐learning‐supported clinical decision‐making | ML models improve prediction of mortality, respiratory failure and AKI | Enables nurse‐informed use of decision‐support and risk alerts | Enhances evidence‐guided decision‐making under uncertainty | [ |
| Frailty, physiologic reserve and risk stratification | Functional reserve and frailty strongly modify outcomes across diagnoses | Encourages holistic assessment beyond organ‐specific parameters | Guides care prioritisation and monitoring intensity | [ |
| Multisystem pathophysiology and systems‐based assessment | Critical illness reflects interacting metabolic, inflammatory, renal and hepatic dysfunction | Promotes integrated, systems‐based nursing surveillance | Reinforces holistic ICU assessment frameworks | [ |
| Preventive care and patient safety | Vaccination, anticoagulation and glucose stability reduce mortality risk | Highlights nursing roles in prevention, adherence and safety initiatives | Aligns with quality improvement and patient safety competencies | [ |
| Shift‐level risk evolution and prioritisation | Patient risk evolves over hours rather than days | Validates nurse‐led reassessment, prioritisation and rapid escalation | Supports dynamic workload and acuity management models | [ |
| Precision‐informed nursing care | Individualised risk stratification improves targeting of monitoring and interventions | Strengthens nursing contributions to personalised critical care | Advances precision nursing and data‐driven care models | [ |
| Interdisciplinary collaboration | Data‐driven insights support shared clinical decision‐making | Enhances nurse–physician communication and coordinated care | Reinforces team‐based, data‐informed ICU practice | [ |
| Study | Clinical task/outcome | Main algorithms/models | Feature processing and selection | Interpretability/fairness methods |
|---|---|---|---|---|
| Meng et al. [ | General ICU outcome prediction; interpretability and fairness | LSTM, transformer, temporal convolutional networks, IMV‐LSTM | Data truncation, data aggregation, missing‐value imputation, normalisation, categorical encoding | Integrated Gradients, DeepLIFT, SHAP, fairness gap metrics |
| Gupta et al. [ | Data pipeline design for MIMIC‐IV | Modular ML pipeline (supports random forest, logistic regression, gradient boosting, XGBoost, recurrent neural network, LSTM, temporal neural network and transformers) | Missing‐value imputation, temporal feature extraction, cohort filtering | Modular interpretability hooks (e.g., ROC‐based metrics and fairness modules) |
| Lin et al. [ | Acute kidney injury in acute pancreatitis | Random forest, SVM, KNN, neural networks, linear model, naive Bayes, gradient boosting | Feature selection via RF importance, scaling | Feature importance analysis |
| Tian et al. [ | Acute kidney injury in liver cirrhosis | Random forest, XGBoost, LightGBM, gradient boosting decision tree | Univariate selection, clinical feature curation | Feature importance analysis and ROC‐based model comparison |
| Sun et al. [ | ICU cardiac arrest mortality | logistic regression, LASSO, XGBoost | Stepwise regression, multi‐collinearity checks | Nomogram |
| Röhr et al. [ | Benchmarking clinical outcome prediction | BERT | Unified pre‐processing benchmark for MIMIC‐IV | AUC values per category |
| Hempel et al. [ | ICU length of stay | XGBoost, random forest, SVM, logistic regression | Feature aggregation of vitals/labs | Feature importance analysis + partial dependence |
| Pang et al. [ | ICU mortality | XGBoost, logistic regression, SVM, decision tree | Feature selection based on ROC curve‐based metrics like AUC | Feature importance analysis + SHAP |
| Lin et al. [ | 30‐day mortality in myocardial infarction | XGBoost, random forest, logistic regression | Feature selection based on ROC curve‐based metrics like AUC | Feature importance analysis + nomogram |
| Wu et al. [ | Delirium in older adult COPD patients | Random forest, XGBoost, logistic regression, SVM | Feature selection based on LASSO regression and the best subset method | SHAP global and local interpretation |
| Li et al. [ | In‐hospital mortality in acute heart failure | random forest, XGBoost, SVM, KNN, decision trees | Feature selection based on LASSO regression | Feature importance analysis + calibration curves |
| Han et al. [ | Sepsis‐associated encephalopathy | XGBoost, LightGBM, CatBoost, multilayer perceptron, SVM | Feature selection based on LASSO regression and Boruta methods | SHAP summary plots |
| Xie et al. [ | In‐hospital death in severe diabetic ketoacidosis | XGBoost, logistic regression, Bayesian information criterion | Multivariate logistic regression filtering | Feature importance analysis + nomogram |
| Hu et al. [ | Sepsis‐associated liver injury | Multivariate logistic regression (LASSO) | Multi‐collinearity screening | Nomogram |
| Shi et al. [ | Diabetic Ketoacidosis prolonged ICU stay | Logistic regression (LASSO) | Feature selection via multivariate analysis | Nomogram |
| Study/outcome | Model type | Key predictors/feature categories | Validation metrics/performance | Cohort/sample notes |
|---|---|---|---|---|
| Lin et al. [ | Nomogram built from logistic regression (after ML screening) | Age, blood urea nitrogen, heart rate, SpO2, bicarbonate, metoprolol use (among others) | In validation set: AUC = 0.835 (95% CI 0.774–0.897); good calibration; accuracy ~0.914 versus SOFA score AUC 0.735 | Patients with myocardial infarction admitted to CCU, extracted from MIMIC‐IV |
| Hu et al. [ | Nomogram based on LASSO + multivariate logistic regression | Six final variables identified (e.g., bilirubin, INR, others) | Training set AUC = 0.814, validation set AUC = 0.809; calibration curves; decision curve analysis; compared versus SAPS II (0.798) and SOFA (0.634) | Older adult ICU patients with sepsis and liver injury (defined by Total Bilirubin > 2 mg/dL, INR > 1.5) from MIMIC‐IV; n_training = 653, n_validation = 281 |
| Peng et al. [ | Nomogram (logistic regression) | Classical myocardial infarction predictors | Performance metrics reported and validated (AUC, calibration) in the internal validation cohort | Retrospective cohort from MIMIC‐IV, ~4688 MI patients |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSepsis Diagnosis and Treatment · Artificial Intelligence in Healthcare and Education · Meta-analysis and systematic reviews
Introduction
1
The digitisation of healthcare has produced large volumes of clinical data that support advances in medical research, nursing practice and patient care [1]. Electronic health records (EHRs) are now central to observational studies, predictive modelling and clinical decision support. Among publicly available resources, the MIMIC‐IV database has become a major dataset, offering deidentified ICU data such as vital signs, labs, interventions and clinical notes [2]. These elements are particularly relevant to critical care nursing, which depends on continuous monitoring, interpretation of physiologic trends and timely decision‐making.
MIMIC‐IV extends earlier MIMIC versions and provides a high‐resolution dataset for studying ICU outcomes, developing machine learning models and examining disease progression [2, 3]. Its breadth has supported research in mortality prediction, sepsis, acute kidney injury, cardiovascular disorders and other analytics‐driven domains aligned with nursing priorities such as early deterioration detection, complication prevention and optimising care interventions.
As research using MIMIC‐IV accelerates, understanding the dataset's scientific influence is increasingly important [4]. Examining publication trends, citation patterns and thematic focus areas helps characterise its academic impact and identify opportunities to strengthen evidence‐based practice through clinical data [5, 6]. This is especially valuable in critical care nursing, where high‐quality data support patient safety, refined assessments and advanced bedside analytics.
This review analyses the bibliographic metadata of MIMIC‐IV research indexed in Web of Science's SCI‐Expanded to characterise research topics and their relevance to data‐driven medicine and ICU nursing. We first describe the MIMIC‐IV database (Section 2), explain our analytical approach (Section 3), present and discuss findings (Section 4) and conclude with future directions for MIMIC‐IV research and this review (Section 5).
Background
2
Medical Information Mart for Intensive Care‐IV (MIMIC‐IV) is a comprehensive, deidentified electronic health record (EHR) dataset developed by MIT and released through PhysioNet. Available as a relational SQL database and via Google BigQuery, it enables efficient querying with standard SQL tools [7]. The dataset includes information from thousands of patients admitted between 2008 and 2019, covering vital signs, laboratory results, diagnoses, procedures, medications and free‐text clinical notes [2]. Its scope supports diverse research applications, from predictive modelling to clinical workflow analysis.
MIMIC‐IV is organised into four major modules: Core, Hospital, ICU and Emergency Department (ED), each representing different components of patient care. The core module contains demographics and admission details, the hospital module includes diagnoses and procedures, the ICU module provides high‐resolution time‐series data and the ED module captures emergency encounters. These modules are linked through standard identifiers (subject_id for patients, hadm_id for hospital admissions, icustay_id for ICU stays), allowing integrated, longitudinal analyses across care settings [2].
As an open database, MIMIC‐IV facilitates transparent, reproducible research and serves as a key resource for developing clinical prediction models, studying disease trajectories and evaluating interventions [7]. It has supported numerous studies on outcomes such as mortality and sepsis, underscoring its significant role in advancing data‐driven critical care research [4].
Study Design and Methods
3
A study of 1150 publications retrieved from the Web of Science Core Collection (SCI‐Expanded). Explicit search strategies, front‐page filtering, and publication counts were used to identify and analyse keyword‐based research themes.
Data Source and Search Strategy
3.1
This review employed a structured bibliometric approach to systematically identify, screen and analyse publications related to the MIMIC‐IV clinical database. The methodology comprised three main stages: data retrieval from the Web of Science Core Collection's Science Citation Index Expanded (SCI‐EXPANDED), eligibility screening and filtering and thematic classification of research topics based on bibliographic metadata. An overview of the approach is presented in Figure 1, while the detailed procedures are provided in the following subsections.
Process for the identification of the MIMIC‐IV research themes.
Topic, Scope and Eligibility
3.2
This review analysed publications related to the MIMIC‐IV clinical database using records retrieved from the SCI‐EXPANDED database in the Web of Science Core Collection (WoSCC). Searches were conducted in the Topic (TS) field, which includes title, abstract, author keywords and Keywords Plus, for publications from 2021 to 2024.
The exact WoS query was:
TS = (‘MIMIC‐IV’ OR ‘MIMIC IV’ OR ‘Medical Information Mart for Intensive Care IV’ OR ‘Medical Information Mart for Intensive Care (MIMIC) IV’).
Eligible document types included articles, reviews, early‐access items and conference papers. All data were extracted on 20 August 2025. A PRISMA‐style workflow is shown in Figure 2.
Flow chart for the search process.
The Web of Science, specifically SCI‐EXPANDED, was selected as the sole source database for this analysis due to its strength in indexing high‐impact, interdisciplinary research across the sciences, social sciences and, crucially, the multidisciplinary domain of biomedical informatics where much MIMIC‐IV research is published [8]. Its curated citation index and consistent indexing practices support robust bibliometric and citation‐based analyses, which are central to this review's aims of mapping influential research themes and trends [9]. While databases such as PubMed offer excellent coverage of clinical and nursing literature [10], SCI‐EXPANDED provides a focussed lens on the broader, citation‐active scientific conversation around a complex resource like MIMIC‐IV, which attracts significant contributions from computational, informatics and engineering fields alongside clinical disciplines [8]. We acknowledge that this choice introduces a potential coverage bias, as some nursing‐specific or clinically focused journals may be less represented in SCI‐EXPANDED than in PubMed or Scopus [10]. Consequently, the findings may under‐represent certain practice‐orientated nursing studies [11]. However, for the purpose of identifying dominant, cross‐disciplinary research themes and the trajectory of a technically intensive dataset like MIMIC‐IV within the broader scientific literature, Web of Science offers a strategically appropriate and methodologically consistent scope [8]. This approach enhances transparency by clearly defining the corpus as the set of MIMIC‐IV studies embedded within the mainstream, citation‐linked scientific record.
Screening
3.3
The initial search retrieved 1180 records. A front‐page filter [12, 13] was applied, retaining records in which ‘MIMIC‐IV’ or its variants appeared in the title, abstract or author keywords. This OR‐based logic avoids overly restrictive filtering and helps remove false positives, especially those retrieved only through Keywords Plus (e.g., due to the generic term mimic).
Two records were excluded because the term ‘mimic’ appeared only in the Keywords Plus field. Non‐research items (e.g., reviews, editorials, letters) were then removed. After filtering, 1150 research articles remained, as shown in Figure 2.
Data Processing and Standardisation
3.4
Bibliographic records were downloaded in plain‐text format and processed in Microsoft Excel 365 [14, 15].
Research themes were identified using word‐frequency analysis of titles, abstracts, author keywords and Keywords Plus [5, 6]. Most common keywords across all kinds of bibliographic metadata were compiled as a word bank. Then, the top keywords featured in the titles or as author keywords are identified and grouped to constitute research themes. The word bank as well as the retrieved top keywords are made available as Supporting Information to this manuscript. This frequency‐based descriptive approach was selected over keyword co‐occurrence or network analysis to align with the review's primary aim of providing a transparent, high‐level mapping of the dominant research landscape. While co‐occurrence analysis excels at revealing relational structures and clusters among concepts [16], the frequency method offers a more direct and interpretable overview of the most prominent, standalone topics within the corpus, which is essential for identifying broad, nursing‐relevant themes.
The grouping of high‐frequency terms into coherent research themes was performed manually by domain experts in critical care and nursing informatics. This expert‐led, qualitative clustering approach, consistent with principles of qualitative content analysis, was chosen to prioritise clinical relevance and interpretive validity over purely algorithmic grouping. Experts reviewed the term list and iteratively grouped related terms into thematic categories based on semantic meaning and their relevance to critical care practice. Thematic saturation was assessed through this iterative process; clustering continued until no new substantive categories emerged and all high‐frequency terms were meaningfully accounted for within the established thematic framework. This method ensures that the resulting themes are not only data‐driven but also clinically meaningful and actionable for nursing.
Results
4
Keyword analyses identified mortality prediction, sepsis, acute kidney injury, intensive care workflows, and machine learning as dominant research areas, many of which are directly relevant to nursing‐sensitive outcomes and bedside clinical decision‐making.
A total of 1150 MIMIC‐IV‐related articles were analysed to identify research foci. Author keywords were available in 90% of the publications, providing a strong basis for theme extraction. Excluding search terms, Table 1 lists the top 20 author keywords and title words. From 2021 to 2024, terms such as ‘mortality’, ‘sepsis’, ‘intensive care unit’, ‘machine learning’ and ‘acute kidney injury’ were consistently dominant. The frequent appearance of ‘machine learning’ and ‘artificial intelligence’ reflects broader trends in computer science, where large clinical datasets increasingly rely on deep learning and data‐driven methods [17, 18].
The major research themes formed by these keywords are shown in Figure 3 and align with prior findings in open medical database research [4], biomedical data mining [19], clinical natural language processing [20], bioinformatics [21] and AI in biomedicine [22, 23, 24]. The critical care nursing translation of these research themes is outlined in Table 2.
Major research themes about MIMIC‐IV.
The following sections summarise the state of the art within each theme, grounded in the bibliometric evidence from the 2021–2024 corpus. Several 2025 studies are referenced only to contextualise these trends and clarify ongoing methodological and clinical developments; they are not part of the analysed dataset and do not extend the review's publication window.
Discussion
5
The findings highlight that MIMIC‐IV research integrates clinical biomarkers, physiologic trends, and machine learning models to inform precision risk stratification, supporting proactive, data‐driven critical care nursing interventions.
Mortality Prediction, Risk Stratification and Disease Prognosis
5.1
Recent studies leveraging the MIMIC‐IV database have established the triglyceride–glucose index (TyG‐i) and its composite measure, TyG‐BMI, as promising predictors of mortality among critically ill patients across diverse cardiovascular and metabolic conditions [25, 26]. Lower TyG‐BMI levels were associated with significantly higher 1‐year all‐cause mortality in heart failure patients [26], higher TyG index values correlated with increased in‐hospital and ICU mortality in sepsis [27], and predicted higher mortality across all follow‐up intervals in haemorrhagic stroke [28]. Hu et al. extended these findings to atrial fibrillation, showing an ‘L‐shaped’ inverse relationship between TyG‐BMI and mortality at multiple timepoints [29]. These data suggest that TyG‐based metrics serve as accessible, non‐invasive biomarkers integrating lipid–glucose metabolism and insulin resistance for prognostic risk stratification in critical care. The prognostic significance of TyG‐derived indices converges in critical illness but uncovers heterogeneity in their directional associations, protective in cardiac dysfunction yet deleterious in sepsis and stroke. This implies disease‐specific metabolic dynamics and differential roles of insulin resistance across pathophysiological contexts. Current research gaps include the absence of prospective validation as well as limited and unclear cut‐off thresholds for clinical application.
Studies leveraging the MIMIC‐IV database have demonstrated the prognostic value of the stress hyperglycaemia ratio (SHR) in critically ill populations across cardiovascular and cerebrovascular contexts. Elevated SHR has been independently associated with short‐ and long‐term all‐cause mortality [30]. Across diverse cohorts, including atrial fibrillation [31], acute myocardial infarction, coronary heart disease [32, 33, 34] and cerebrovascular disease [35], high SHR values predicted greater ICU, in‐hospital, and 1‐year mortality, often displaying U‐ or J‐shaped associations with survival. Several analyses identified that non‐diabetic patients were vulnerable to stress hyperglycaemia–related mortality. Most studies applied Cox proportional hazards models, logistic regression, restricted cubic spline analyses and Kaplan–Meier survival curves, reinforcing the robustness of findings. Literature converges on SHR as a reproducible, non‐invasive biomarker linking stress‐induced hyperglycaemia to adverse outcomes in ICU patients. However, heterogeneity exists regarding optimal cut‐off values.
The current literature positions delirium as a prognostic biomarker. Studies provide converging evidence that delirium significantly increases short‐term mortality and interacts with key physiological markers. Zhang et al. [36] developed a predictive model identifying five independent mortality predictors in patients with sepsis or sepsis‐associated delirium [36]. Complementarily, Liu et al. analysed 22 361 older adult ICU patients and found delirium in approximately 24% of cases, confirming it as an independent predictor of in‐hospital mortality after adjustment via propensity score matching [37]. Logistic regression revealed significant interactions between delirium, SOFA score and haemoglobin levels, with mortality risk attenuated in patients with SOFA > 12 or haemoglobin > 15 g/dL. Together, these studies underscore the complex interplay between delirium, organ dysfunction, and metabolic derangements in critically ill populations [37].
Emerging evidence from MIMIC‐IV‐based studies highlights the prognostic role of systemic inflammatory markers. NLR, PLR and SII were independently associated with all‐cause mortality in patients with atrial fibrillation [38]. These findings support the use of simple haematological ratios as cost‐effective biomarkers for mortality risk stratification in AF. Complementarily, Jiang et al. investigated 16 007 septic ICU patients and found a J‐shaped association between SII and 28‐day mortality, with both low and high SII values conferring elevated risk [39]. These studies establish inflammation‐derived indices as robust predictors of adverse outcomes, reflecting immune dysregulation and systemic stress in critical illness.
Recent MIMIC‐IV‐based studies have explored other diverse prognostic biomarkers emphasising metabolic, haemodynamic and inflammatory pathways. Glycaemic variability predicts poorer consciousness and higher in‐hospital mortality among traumatic brain injury patients [40]. Similarly, the haemoglobin glycation index (HGI) [41] demonstrated a U‐shaped association with mortality in critically ill coronary artery disease patients, underscoring the prognostic relevance of dysregulated glucose metabolism. In cardiovascular settings, estimated plasma volume status (ePVS) and frailty scores emerged as independent predictors of short‐ and long‐term mortality in myocardial infarction and heart failure, with frailty notably enhancing predictive models beyond conventional risk scores [42]. In sepsis, renal mean perfusion pressure [43] and organism type [44] were strongly linked to outcomes, while timely vasopressin initiation [45] and aspirin therapy [46] showed protective associations, improving survival in septic shock and sepsis‐associated acute kidney injury, respectively. Additionally, serum ionised calcium [47] displayed a U‐shaped relationship with ischaemic stroke mortality, reinforcing the importance of electrolyte balance. Collectively, these investigations highlight the prognostic potential of routinely measurable clinical and laboratory parameters, promoting precision stratification in critical care through real‐world evidence. However, heterogeneity in measurement definitions, retrospective design, and limited external validation restrict clinical translation. The analysis of the MIMIC‐IV research outputs reveals a growing interest in non‐conventional biomarkers, underscoring the field's transition toward individualised pathophysiology‐driven models of investigation.
Collectively, this body of MIMIC‐IV‐derived evidence contributes to critical care nursing knowledge by demonstrating that routinely available metabolic, inflammatory, neurologic and haemodynamic biomarkers function as integrated indicators of physiologic stress, reserve and dysregulation rather than isolated laboratory abnormalities. These findings inform nursing assessment by supporting continuous, trend‐based physiologic surveillance that integrates glucose dynamics, lipid–metabolic signals, mental status changes, inflammatory burden and perfusion parameters to identify early, disease‐specific deterioration patterns. At the level of ICU shifts, this evidence reinforces nurses' role in anticipatory clinical decision‐making, prioritising high‐risk patients for intensified monitoring, timely escalation, and targeted interventions, while highlighting the need for contextual interpretation of biomarkers rather than reliance on single threshold values.
Critical Illness Management and Implications for Critical Care Nursing
5.2
To provide a comprehensive perspective on how MIMIC‐IV‐derived evidence informs prognosis, intervention strategies and nursing practice in critical care, findings from disease‐specific and system‐level investigations are synthesised across Tables 3 and 4. Together, these syntheses trace a progression from empirically derived predictors of adverse outcomes to their implications for time‐sensitive clinical judgement, nursing competencies and the advancement of critical care nursing knowledge.
The synthesis presented in Table 3 integrates metabolic and inflammatory biomarkers, therapeutic exposures, and predictive modelling approaches associated with mortality, organ dysfunction and other adverse outcomes across major critical illness domains, including sepsis and septic shock, acute kidney injury, cardiovascular conditions, acute pancreatitis and heterogeneous ICU populations. Rather than isolating individual predictors, this body of evidence illustrates how routinely collected physiologic, laboratory and treatment data converge to characterise high‐risk clinical trajectories. Across conditions, adverse outcomes emerge not from singular abnormalities but from the interaction of metabolic stress, inflammatory burden, physiologic reserve and the timing of therapeutic interventions. These relationships are discernible through the continuous and high‐resolution data captured within MIMIC‐IV.
Complementing these biomarker‐focussed findings, the integrated synthesis in Table 4 broadens the analytic lens to include non‐biomarker dimensions of critical illness and their relevance to nursing practice and education. This evidence highlights the importance of therapeutic timing, longitudinal physiologic trends, machine learning–enabled risk stratification, multisystem pathophysiology, preventive and modifiable exposures and the evolution of patient risk over the course of hours rather than days. Collectively, these insights emphasise that critical illness is inherently dynamic and process‐driven, shaped as much by the temporal pattern of physiologic change and clinical response as by the presence of derangement itself.
When situated within a nursing framework, these findings underscore the centrality of continuous surveillance, trend interpretation and timely escalation of care. The alignment of MIMIC‐IV evidence with core nursing learning outcomes demonstrates how large‐scale ICU data analytics directly support essential practice domains, including early recognition of clinical deterioration, systems‐based physiologic assessment, risk stratification and prioritisation, precision‐informed monitoring, prevention and patient safety and interdisciplinary collaboration. In this context, nurses emerge not merely as data recipients but as critical interpreters and integrators of evolving physiologic, metabolic and functional information at the bedside.
Taken together, this synthesis illustrates how MIMIC‐IV research advances critical care by linking biomarker discovery, therapeutic processes and nursing practice within a unified, data‐informed framework. The convergence of metabolic and inflammatory indicators, dynamic physiologic trends, treatment timing and predictive modelling supports a shift away from static, diagnosis‐centred assessment toward continuous, systems‐oriented and precision‐informed care. For critical care nursing, these insights reinforce the value of leveraging real‐time and longitudinal data to anticipate deterioration, prioritise interventions, and contribute meaningfully to interdisciplinary clinical decision‐making in the intensive care unit.
Models and Algorithms for Data Processing
5.3
Research leveraging the MIMIC‐IV database employs a diverse array of modelling frameworks and algorithmic strategies, encompassing both deep learning interpretability approaches and traditional machine learning pipelines. Meng et al. [80] developed an MIMIC‐IV‐based framework to evaluate model interpretability (i.e., the extent to which a model's predictions can be understood by humans) and fairness (i.e., the extent to which algorithms avoid discriminatory behaviour towards specific individuals or groups) across deep neural architectures, including LSTM, Transformer, temporal convolutional networks (TCN) and IMV‐LSTM, trained on time‐series and static clinical data for mortality prediction. Their interpretability assessment integrates post hoc methods such as Integrated Gradients, DeepLIFT and SHAP, while fairness is quantified through subgroup bias metrics across demographic attributes. This dual focus on explainability and fairness distinguishes MIMIC‐IF as a meta‐framework for evaluating clinical AI transparency rather than a single predictive model. Complementing this interpretability‐driven research, Gupta et al. [81] developed a modular MIMIC‐IV data processing pipeline that standardises cohort construction, imputation, temporal feature extraction and model evaluation, serving as a generalisable foundation for subsequent predictive modelling efforts.
Building upon these foundational pipelines, numerous clinical prediction studies using MIMIC‐IV have applied machine learning models to forecast outcomes such as acute kidney injury [64, 82], in‐hospital mortality [49, 50, 83] and length of ICU stay [48]. These studies typically follow a structured data processing pipeline: cleaning and merging time‐series tables, performing feature engineering and selection (often via random forest importance or recursive elimination), and comparing multiple algorithms such as logistic regression, random forest, support vector machines, gradient boosting and neural networks. Interpretability remains central to these works, with SHAP and LIME dominating as feature attribution techniques that help validate model findings against known physiological indicators. Röhr et al. [84] and others have further contributed benchmark frameworks that emphasise data harmonisation and reproducibility across predictive modelling pipelines, highlighting the importance of consistent preprocessing for fair model comparison. Table 5 summarises these diverse modelling strategies, identifying each review's algorithms, feature processing techniques and interpretability tools.
In simple terms, researchers using the MIMIC‐IV database follow similar steps when building prediction models. First, they clean and organise large amounts of patient data. Then, they use computer algorithms to look for patterns that can predict outcomes such as death, kidney injury or length of stay in intensive care. Many studies compare several types of models to find the most accurate one. Importantly, researchers also use special tools to explain how these models make decisions and to check that they do not treat certain patient groups unfairly. This focus on transparency and fairness helps ensure that AI systems can be trusted and understood in clinical practice.
A substantial subset of MIMIC‐IV studies has employed nomogram‐based logistic regression models to enhance clinical interpretability while retaining statistical rigour. Lin et al. [71] constructed a nomogram predicting 30‐day mortality among myocardial infarction patients, achieving strong discriminative power (AUC = 0.835) with key predictors such as age, BUN and SpO_2_. Similarly, Hu et al. [88] developed a LASSO‐selected logistic regression nomogram for sepsis‐associated liver injury mortality (AUC = 0.809), outperforming SOFA and SAPS II scores, whereas Peng et al. [90] applied comparable methods to myocardial infarction mortality. These models emphasise calibration, discrimination and clinical usability through visual decision aids. Table 6 details these nomogram frameworks, summarising their key variables, validation metrics and cohort characteristics. Together, these studies reveal that MIMIC‐IV's data processing ecosystem, spanning deep learning interpretability frameworks, modular ML pipelines and nomogram‐based logistic models, balances computational complexity with clinical transparency, forming a robust methodological backbone for critical care outcome prediction.
Collectively, this body of MIMIC‐IV‐based modelling research advances critical care nursing knowledge by demonstrating that robust outcome prediction can be achieved when advanced machine learning, deep learning and traditional statistical frameworks are paired with transparency, fairness and clinical interpretability. These findings inform nursing assessment and physiologic surveillance by validating that routinely collected time‐series data (vital signs, laboratory trends and organ function markers) contain actionable signals of deterioration that can be made interpretable and clinically meaningful through explainable AI and nomogram‐based tools. At the level of ICU shifts, this evidence supports nursing clinical decision‐making by reinforcing trust in data‐driven risk stratification, enabling earlier recognition of high‐risk trajectories, and facilitating nurse engagement with predictive outputs that align with physiologic reasoning rather than opaque black‐box predictions.
Limitations
5.4
The evidence is largely retrospective, with variable biomarker definitions, limited external validation, gaps in ICU workflow, nursing‐sensitive outcomes, and multimodal data, and is further constrained by the exclusive use of Scopus, which may have omitted relevant studies indexed in other databases.
Relevance to Clinical Practice
6
MIMIC‐IV supports the generation of evidence on essential nursing concerns. Recognising global research patterns enables nurses, clinicians, and informatics teams to identify emerging tools, prioritise data‐driven competencies, and translate large‐scale analytics into improved ICU care and patient outcomes.
Conclusions
7
This review provides the first focused mapping of MIMIC‐IV‐based research (2021–2024) using SCI‐Expanded data, revealing a clear division between mature, well‐established themes and emerging areas with substantial growth potential. Mature domains are characterised by high publication volume, methodological standardisation, and strong clinical relevance. These include mortality prediction and risk stratification using biomarkers and machine learning; sepsis and septic shock research addressing prognostic markers, therapies, and organ failure prediction; acute kidney injury modelling across critical conditions; cardiovascular disorders such as atrial fibrillation, heart failure and myocardial infarction; and data processing approaches that commonly employ standardised machine learning pipelines, interpretability tools (e.g., SHAP, LIME) and nomogram‐based regression models.
In contrast, several areas remain underdeveloped and present important opportunities for future, particularly nurse‐led, research. These include ICU workflows and care processes, nursing‐sensitive outcomes, patient experience and family‐centred care, health equity and disparities, intervention effectiveness and implementation science and the integration of multimodal data such as nursing notes and physiological waveforms. Addressing these gaps would shift MIMIC‐IV research beyond prediction towards practical, equitable and translational insights. Future work should therefore balance refinement of established models with exploratory studies that directly inform nursing practice, patient safety and care quality at the bedside.
Author Contributions
Y.‐S.H., A.B.S. and H.T. conceptualised the work. Y.‐S.H. did the data collection. Y.‐S.H., A.B.S., M.K., A.D. and H.T. did the investigation and wrote the main manuscript text. Y.‐S.H., A.B.S., Y.M., and H.T. reviewed the manuscript and supervised the research work.
Funding
The authors have nothing to report.
Ethics Statement
The authors have nothing to report.
Consent
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1B. S. Glicksberg , K. W. Johnson , and J. T. Dudley , “The Next Generation of Precision Medicine: Observational Studies, Electronic Health Records, Biobanks and Continuous Monitoring,” Human Molecular Genetics 27, no. R 1 (2018): R 56–R 62, 10.1093/hmg/ddy 114.29659828 · doi ↗ · pubmed ↗
- 2A. E. W. Johnson , L. Bulgarelli , L. Shen , et al., “MIMIC‐IV, a Freely Accessible Electronic Health Record Dataset,” Scientific Data 10, no. 1 (2023): 1, 10.1038/s 41597-022-01899-x.36596836 PMC 9810617 · doi ↗ · pubmed ↗
- 3C. M. Sauer , T. A. Dam , L. A. Celi , et al., “Systematic Review and Comparison of Publicly Available ICU Data Sets: A Decision Guide for Clinicians and Data Scientists,” Critical Care Medicine 50, no. 6 (2022): E 581–E 588, 10.1097/CCM.0000000000005517.35234175 PMC 9150442 · doi ↗ · pubmed ↗
- 4Y. Ke , R. Yang , and N. Liu , “Comparing Open‐Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study,” Journal of Medical Internet Research 26 (2024): e 48330, 10.2196/48330.38630522 PMC 11063894 · doi ↗ · pubmed ↗
- 5C. C. Wang and Y. S. Ho , “Research Trend of Metal‐Organic Frameworks: A Bibliometric Analysis,” Scientometrics 109, no. 1 (2016): 481–513, 10.1007/s 11192-016-1986-2. · doi ↗
- 6G. F. Zhang , S. D. Xie , and Y. S. Ho , “A Bibliometric Analysis of World Volatile Organic Compounds Research Trends,” Scientometrics 83, no. 2 (2010): 477–492, 10.1007/s 11192-009-0065-3. · doi ↗
- 7R. Al Attrach , P. Moreira , R. Fani , R. Umeton , and L. A. Celi , “Conversational LL Ms Simplify Secure Clinical Data Access, Understanding, and Analysis,” 2025, ar Xiv preprint ar Xiv:2507.01053, 10.48550/ar Xiv.2507.01053. · doi ↗
- 8C. Birkle , D. A. Pendlebury , J. Schnell , and J. Adams , “Web of Science as a Data Source for Research on Scientific and Scholarly Activity,” Quantitative Science Studies 1, no. 1 (2020): 363–376, 10.1162/qss_a_00018. · doi ↗
