# Protocol for processing multivariate time-series electronic health records of COVID-19 patients

**Authors:** Zixiang Wang, Yinghao Zhu, Dehao Sui, Tianlong Wang, Yuntao Zhang, Yasha Wang, Chengwei Pan, Junyi Gao, Liantao Ma, Ling Wang, Xiaoyun Zhang

PMC · DOI: 10.1016/j.xpro.2025.103669 · 2025-03-05

## TL;DR

This paper introduces a standardized protocol for processing complex electronic health records of COVID-19 patients to improve AI-based predictions of hospital outcomes.

## Contribution

A detailed, reproducible protocol for standardizing and processing multivariate time-series EHR data for AI model training in the context of COVID-19.

## Key findings

- The protocol includes steps for data standardization, formatting, and model training.
- It focuses on predicting in-hospital mortality and length of stay for COVID-19 patients.
- The method aims to improve the accuracy of predictive models by addressing data processing inconsistencies.

## Abstract

The lack of standardized techniques for processing complex health data from COVID-19 patients hinders the development of accurate predictive models in healthcare. To address this, we present a protocol for utilizing real-world multivariate time-series electronic health records of COVID-19 patients. We describe steps for covering the necessary setup, data standardization, and formatting. We then provide detailed instructions for creating datasets and for training and evaluating AI models designed to predict two key outcomes: in-hospital mortality and length of stay.

For complete details on the use and execution of this protocol, please refer to Gao et al.1

•Steps for standardizing multivariate time-series EHR data format of COVID-19 patients•Instructions for processing EHR data of COVID-19 patients for training AI models•Guidance on training and evaluating AI models through tailored pipelines

Steps for standardizing multivariate time-series EHR data format of COVID-19 patients

Instructions for processing EHR data of COVID-19 patients for training AI models

Guidance on training and evaluating AI models through tailored pipelines

Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

The lack of standardized techniques for processing complex health data from COVID-19 patients hinders the development of accurate predictive models in healthcare. To address this, we present a protocol for utilizing real-world multivariate time-series electronic health records of COVID-19 patients. We describe steps for covering the necessary setup, data standardization, and formatting. We then provide detailed instructions for creating datasets and for training and evaluating AI models designed to predict two key outcomes: in-hospital mortality and length of stay.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11928838/full.md

---
Source: https://tomesphere.com/paper/PMC11928838